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Daniel J. Ford 

Abstract 

This report introduces the alpha model. The alpha model is a one 
parameter family of probability models on cladograms (binary leaf- 
labeled trees) which interpolates continuously between the Yule, Uni- 
form and Comb distributions. The single parameter a varies from to 
1, with a = giving the Yule model, a = 1/2 the Uniform, and a = 1 
the Comb. For each fixed a, the alpha model is a sequence, {P n } ng N, 
with P n a probability on cladograms with n leaves. This sequence 
is sampling consistent, roughly meaning that choosing a random tree 
from P n and deleting k random leaves gives a random tree from P n -&. 
It is also Markovian self-similar. The only other known family with 
these properties is the beta model of Aldous. An explicit formula is 
given to calculate the probability of a given tree shape under the al- 
pha model. Statistics such as the expected depth of a random leaf are 
shown to be 0{n a ) for a / 0. The number of cherries on a random 
alpha tree is shown to be asymptotically normal with known mean 
and variance. Finally the shape of published phylogenies is examined, 
using trees from Treebase. 
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1 Introduction 

This report introduces a family of probability models on cladograms, collec- 
tively called the alpha model. Each model consists of a sequence of proba- 
bilities, one for each size of tree, which is Markovian self-similar and deletion 
stable. The family of models is parameterized by a single number a G [0, 1] 
and interpolates continuously between the three most popular models on 
cladograms called the Yule, Uniform and Comb models. Analogous families 
of models are defined for several other types of tree. 

A cladogram is a rooted binary tree with n leaves labeled 1 up to n, a 
root vertex and n — 1 internal vertices. Cladograms are used in biological sys- 
tematics to represent the evolutionary relationship between n species. They 
are sometimes called phylogenetic trees, although some authors reserve this 
term for cladograms with edge lengths. 

The three most popular probability models on cladograms are the Yule 
model, the Uniform model and the Comb model. The Yule model is also 
referred to as the neutral evolution model. The Uniform model assigns the 
uniform probability measure to cladograms of each size. The Comb model 
assigns probability 1 to the most asymmetric tree of each size. 

These have the property that they are deletion stable, also called sampling 
consistent, and Markovian self-similar. Informally, deletion stability means 
that deleting a random leaf from a random tree with n leaves gives a random 
tree from the same model with n — 1 leaves. Markovian self-similarity means 
that the subtree below an edge is distributed independently according to the 
same model. Symmetry under permutation of leaf labels is also desirable. 

Previously, David Aldous has introduced a one dimensional continuous 
family of models, collectively called the beta model ( (Ij , [S] , ) , which in- 
terpolates between the Yule, Uniform and Comb models. These are also 
deletion stable and Markovian self similar, and display qualitatively different 
behaviors for different values of the parameter f3. 

The alpha model introduced here has a very simple definition which allows 
many of its properties to be exactly calculated for finite values of n. Basically, 
leaves are inserted one after another until the desired number is reached. A 
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leaf is inserted at a given internal edge with probability and at a given 
leaf edge with probability Setting a = results in the Yule model, 

a = \ gives the Uniform model and a = 1 the Comb model. 

Section 2 introduces the necessary basic definitions and results about 
trees. Four particular types of tree are defined: cladograms, fat cladograms, 
tree shapes and fat tree shapes. The are related by the maps which forget 
leaf labels or the ordering of children. The operation of joining two trees at 
the root is also defined. 

In Section 3 the alpha model is defined. In fact, a model is defined for 
each of the four types of tree discussed in Section 2. These are related by 
the operations of forgetting leaf labels or ordering of children. Markovian 
self-similarity and deletion stability are defined, and the alpha model shown 
to have these properties. Necessary and sufficient conditions for a Markovian 
self-similar model to be deletion stable are also derived. The probability of a 
tree under a Markovian self-similar model is calculated for each type of tree 
and these results applied to the alpha model. 

Next, the alpha model is shown to pass through the Yule, Uniform and 
Comb models. The beta model is also briefly described and shown to be dif- 
ferent from the alpha model except where they intersect at the Yule, Uniform 
and Comb models. 

In SectionEJ two statistics on trees are discussed. These are Sackin's index 
and Colless' index. Sackin's index is the sum of the distances from each leaf 
to the root, and Colless' index is the sum of the differences of the number of 
leaves to the left and right of each branch-point. These are shown to differ by 
at most | log 2 n on a rooted binary tree with n leaves. For the alpha model, 
Sackin's index is shown to be 0(n 1+a ) for a G (0, 1]. Thus the covariance of 
Sackin's index and Colless' index is asymptotically 1 for a G (0, 1]. The case 
of the Yule model, a = 0, has been studied before. In that case both Sackin's 
and Colless' index are 0(nlogn) with known constants and covariance. 

Another statistic for cladograms or binary trees is the number of cherries, 
addressed in Sectional A cherry is a pair of leaves which are adjacent to each 
other. McKenzie and Steel ^B] have shown that for the Yule and Uniform 
models the number of cherries is asymptotically normal, with known mean 
and variance. These results are extended in Section El to show that, for any 
a G [0, 1), the number of cherries in a random tree from the alpha model is 
asymptotically normal with known mean and variance. The Comb model, 
a = 1, is deterministic with exactly one cherry for a comb tree with at least 
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2 leaves. 

Section looks at the shape of published phylogeny. Despite the increase 
in published phylogeny, this appears to be the first systematic study of the 
shape of a large number of published phylogenetic trees, perhaps with the 
exception of fH] . 

Natural questions to ask about the shape of cladograms or phylogentic 
trees include: Are they symmetrical and flat, or asymmetrical and deep? Is 
there systematic bias in reconstruction algorithms? The trees analyzed are 
those in Treebase a free database of published phylogeny. In the past, a 
major stumbling block was the lack of a measure of imbalance which could 
be compared across trees of different sizes, see ^3] for example. Fortunately, 
the maximum likelihood estimate of a is such a measure of imbalance. 

All binary trees from Treebase (as of Nov. 2004) are analyzed and their 
shapes compared using the alpha model. A variety of statistics are used to 
consider the goodness of fit of the alpha model to this data. Two common 
models for cladograms are the Yule and Uniform. It has often been noted 
that published trees tend, on average, to be less balanced than Yule trees but 
more balanced than Uniform trees. This observation is verified and quantified 
for a large set of trees. 

This analysis of Treebase was carried out in November 2004 and pre- 
sented at the Annual New Zealand Phylogenetics Conference in Feburary 
2005, along with a brief summary of Sections 2-5. 

Finally, I would like to thank my advisors Persi Diaconis and Susan 
Holmes who have offered much guidance and support. This work forms part 
of my PhD thesis and grew out of a homework exercise in a combinatorics 
class of Persi's. The analysis of Treebase was suggested by Susan Holmes. 
This work was supported in part by NSF award #0241246 (Principal inves- 
tigator Susan Holmes). 

2 Basic definitions and constructions for trees 

The basic objects discussed throughout this work are trees. These will usually 
have a root vertex and leaf labels. 

Trees will be thought of as growing down from the root. The descendents 
of a vertex are those vertices further from the root, and the ancestors those 
which are closer to the root. The parents and children of a vertex are those 
vertices immediately above and below, respectively. 
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1 2 3 2 3 1 

Figure 1: The same thin cladogram, but different fat cladograms 



Some trees are also 'fat trees', in which case the ordering of the children 
of each vertex is important. Thus, for fat binary trees it make sense to talk 
of the left and right child, and the left and right subtree below a vertex. For 
thin (non-fat) trees, the children of a vertex are not ordered. So, for example, 
in Figure ^ the two diagrams represent the same thin tree, but different fat 
trees. 

Isomorphisms between trees are what you might expect: graph isomor- 
phisms which preserve any additional structure. Isomorphic trees are con- 
sidered equal. 

The obvious forgetful maps which forget either labelings, or the ordering 
/ orientation in fat trees, will also be used. 

The four main type of trees considered here are: 

• tree shapes, which are unlabeled binary rooted trees; 

• cladograms, which are tree shapes where the n leaves have distinct 
labels 1 up to n; 

• fat tree shapes, which are tree shapes where the children of each vertex 
are ordered; 

• fat cladograms, which are cladograms where the children of each vertex 
are ordered. 

The forgetful maps send each of these types to another. 

The symmetric group on a labeling set acts in the obvious way on a 
leaf-labeled tree: by permuting the leaf labels. 
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Also, the subtree below an edge is defined to be the subtree consisting of 
all vertices and edges below and including the specified edge. The root join 
of two trees is the tree formed by gluing their two root vertices together and 
gluing a new root edge to this vertex. Thus the old root vertices are now the 
same immediate descendent of the new root vertex. 

The remainder of this section is devoted to the rigorous definition of these 
ideas. 

Finally, some familiarity with basic probability is assumed. If you lack 
this background, despair not. Most sets of interest here are finite, in which 
case a probability is simply a positive real function on the set which sums 
to I. Independence allows probabilities to be multiplied in the most natural 
way. 

Functions between finite sets extend by linearity to functions between the 
probabilities on these sets. For notational convenience, the original function 
and its linear extension will usually be conflated. 

2.1 Graphs, trees and roots 

A graph is a pair of sets (V,E), where E C {{u,v}\u,v G V}. The set V is 
called the set of vertices, and E is called the set of edges. Call {u,v} G E 
an edge from u to v. Note that this definition does allow 'self edges' but not 
'multiple edges'. Call {u,v} G E a self-edge if u — v. 

Say that u,v G V are adjacent, or neighbors, in graph (V, E) if {u, v} G E. 

A path from vertex v± G V to vertex t> 2 G V in a graph (V, E) is a 
finite non-empty sequence (oi)" =0 such that Oj G V, ao = v±, a n — v 2 , and 
{oj, a i+ i} G E for alH G {0, 1, . . . , n — 1}. 

The length of a path (aj)" =0 is defined to be n. Note that any sequence 
of vertices of length 1 is a path of length 0. Thus for every vertex there is a 
path from it to itself. 

A path, (aj)" =0 , is called self -intersecting if Oj = aj for some i ^ j. 

Proposition 1 // there is a self-intersecting path from vertex u to v then 
there is a non self-intersecting path from u to v. 

Proof. 

Suppose (ai)r=o i s a self-intersecting path from u to v. Let 
j = min {i\di — a^, i, k G {0, . . . , n}, i ^ k) 
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Choose k ^ j such that a k = aj. Now {ai)i=i,...,j,k+i,...,n is a path from u to i>, 
as {afe, cifc+i} G -E" and so {cij, a^+i} G -E. If this new path is self-intersecting 
then the same argument may be applied to it. As the path length is decreased 
each time, this process may be repeated only a finite number of times after 
which the resulting path from u to v must be non self-intersecting. □ 

A tree is a graph such that for each pair of vertices there is exactly one 
non self-intersecting path from the first vertex to the second. 

The distance between two vertices is defined to be minimal length of a 
path from one to the other. In other words, the distance between vertices 
i>i, v 2 G V is defined to be d(vi, v 2 ) = min{n|(aj)™ =0 is a path from v± to v 2 }■ 
Note that min0 = +00 

Proposition 2 d(-, ■) is a metric. 

Proof. 

If (o-i)i=o,...,n is a path from u to v then (aj)j= ni ...,o is a path from v to u. 
Thus d(u, v) = d(v,u). If (ai)i=o,...,n is a path from u\ to w 2 of length 
n and (&i)j=i v .. im is a path from u 2 to w 3 of length m then a„ = bo and 
so (ao, • • • , a n , 61, . . . , 6 m ) is a path from -ui to M3 of length m + n. Thus 
d(«i,tt3) < d(ui,u 2 ) + d(u 2 ,us). Finally, (v) is a path from i> to v of length 
and so d{v, v) — 0. □ 

Call a graph connected if there is a path from every vertex to every other 
vertex. In other words, d(vi,v 2 ) < 00 for all v±,v 2 G V. 

The degree of a vertex t> G V in a graph (V, E) is defined to be d(v) = 
\{{u,v} E E}\ + \ {{v,v} G E}\. Note that a self-edge, if it exists, is counted 
twice. In other words, the degree of a vertex is the number of 'half-edges' 
which are incident to it. 

A leaf is a vertex of degree 1 . 

A binary tree is a tree where every vertex has degree 1 or 3. This is 
sometimes called a trivalent tree. 

A rooted graph is a tuple (V, E, r) such that (V, E) is a graph and r G V . 
The vertex r is called the root of the graph. The empty graph (0, 0) may be 
considered as a rooted graph. 

A rooted tree is a rooted graph which is a tree, such that the root vertex 
is a leaf. In this case, the set of leaves and number of leaves will not include 
the root vertex. This convention will sometimes be highlighted by use of the 
term non-root leaves. 
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2.2 Ancestors, descendents, parents and children 

Say that a path (aj)" =0 passes through vertex x G V if = x for some 
i G {0,...,n}. 

For the remainder of this section, let x and y be vertices of a rooted tree 
with vertex set V, edge set E and root r. 

Call y an ancestor of re if y lies on the unique non self-intersecting path 
from x to the root. 

Call x a descendent of y if y is an ancestor of x. 

Call y a parent of x if {w, t> } is an edge and y is an ancestor of x. Unique- 
ness of the non self-intersecting path from x to the root and the absence of 
cycles implies that the parent of a vertex is unique. 

Call x a child of y if y is the parent of x. 

In this way, the vertices of a rooted tree have a poset structure, with the 
root as the unique maximum element. In this partial order, a vertex x is said 
to be greater than a vertex y if and only if x is an ancestor of y. 

Given a set of vertices, s, define the latest common ancestor of these 
vertices to be a vertex which has every element of s as a descendant, but 
for which no descendent of this vertex has that property. The finite tree 
structure guarantees that this vertex exists and is unique. 

2.3 Fat, thin, labeled, unlabeled and the forgetful maps 

The additional properties fat, thin, labeled and unlabeled are now defined, as 
well as the associated forgetful maps. 

A partial function between two sets X and Y consists of a subset Z of X 
and a set map from Z to Y. The subset Z is called the domain of the partial 
function. 

A partial labeling of a graph is a partial function from the vertex set to a 
set which is called the set of labels. A vertex is said to be labeled if it is in 
the domain of this partial function. 

A tree together with a labeling is called a labeled tree. If every vertex 
is labeled then the tree is said to be totally labeled. Throughout this text, 
labelings are not assumed to be total, and partially labeled trees may be 
referred to simply as labeled trees. 

A tree is said to be leaf labeled if it has a labeling such that the set of 
labeled vertices is exactly the set of leaves. In other words, the domain of 
the labeling function is the set of leaves. 
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An orientation of a graph is a map which assigns to each vertex a cyclic 
ordering on its set of neighbors. Call the image of a vertex under this map 
the orientation at that vertex. 

A tree together with an orientation is called a fat tree, or ribbon tree. A 
tree without an orientation is called a thin tree. Trees are assumed to be thin 
unless stated otherwise. 

The map F Q forgets orientations and the map Fi forgets labelings. Thus 
applying F Q to a partially labeled fat tree gives a partially labeled thin tree. 
Applying Fi to a labeled tree gives the same tree without its labeling function. 
Explicitly: 

Definition 3 If t is a fat tree then F a (t) is a thin tree with the same vertex 
set, edge set, and any other properties such as root or labeling. 

If t is a labeled tree then Fi(t) is an unlabeled tree with the same vertex 
set, edge set, and any other properties such as root or orientation. 

Note that F Q and F\ commute, in the sense that applying F Q Fi or FiF 
to a fat labeled tree gives the thin unlabeled tree with the same vertex and 
edge set, and any other properties such as a root. 

F Q F l = F^ (1) 

2.4 Four types of tree: fat and thin cladograms and 
tree shapes 

Definition 4 A cladogram with n leaves is a partially labeled rooted binary 
tree with n leaves (not including the root) and label set {1,2, ... ,n}, such 
that the labeled vertices are exactly the (non-root) leaves and no two leaves 
have the same label. Thus each label 1,2, ... ,n appears exactly once. Define 
the empty labeled tree to be a cladogram with leaves. 

The four types of tree of particular interest here are: 

• rooted binary trees, also called tree shapes; 

• cladograms, as defined above; 

• fat rooted binary trees, also called fat tree shapes. 

• fat cladograms, which are cladograms together with an orientation. 
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Thus the map F Q sends fat tree shapes to tree shapes, and sends fat 
cladograms to cladograms. The map Fi sends cladograms to tree shapes and 
sends fat cladograms to fat tree shapes. 

2.5 Isomorphisms of trees 

In this section, isomorphism is defined for various types of tree. In summary, 
an isomorphism here is a graph isomorphism which preserves any additional 
structure. General morphisms of trees are omitted, but may be easily guessed 
at. 

An isomorphism between trees (V\, Ei) and (V^,^) is a bijection / : 
V 1 — > V 2 such that {/(«), f(v)} G E 2 if and only if {u, v} G E 1 . 

If either of the trees has extra structure such as a root, orientation or 
labeling then both must have this extra structure and it must be preserved 
by the map /. In particular: 

• If r\ is the root of the first tree then /(ri) is the root of the second; 

• If <72 is the labeling of the second tree then fg 2 is the labeling of the 
first tree; 

• If (v i, t>2, ■ ■ • , Vk) is the cyclic orientation at vertex v then 
(f{vi), . . . , f(vk)) is the cyclic orientation at vertex f{y). 

Isomorphic trees are considered equal. 

2.6 The action of the symmetric group on leaf labels 

Definition 5 If t is a labeled tree with labeling partial function g and label 
set L and a is a permutation of the set L then define o~(t) to be a tree identical 
to t except that it has labeling function o~g. 

In other words, apply the permutation to each label. This defines a group 
action. 

Some permutations will act trivially on some cladograms, such as the 
permutations (12) (3) (45) and (14) (25) (3) acting on the tree shown in Figure 

m 

Let S n denote the permutation group of [n] = {1, 2, . . . , n}. In this case, 
the group action just defined extends uniquely, linearly, to an action of prob- 
abilities on S n upon probabilities on (fat or thin) cladograms. The action of 
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Figure 2: A cladogram invariant under permutations (12) (3) (45) and 
(14)(25)(3) 



the element -7V ci; o will be of interest later on. This element has the effect 
of applying a uniform random permutation to the leaf labels of a cladogram. 

2.7 Useful constructions on trees 

This section covers the root join operation on trees, and the set of splits of 
a tree. The root join is used in the next chapter to define the alpha models. 
The splits of a tree are used to calculate the probability of a given tree under 
these models. 

Informal definitions are given first, followed by more rigorous definitions 
and proofs. 

If t is a fat rooted binary tree which has left subtree t\ and right subtree 
£2 then t may be thought of as the tree formed by joining together t\ and t 2 
at their roots. This is denoted t\ * £2 = t. See Figure El for an example. 

This construction also makes sense for thin (non-fat) trees and labeled 
trees, and this root join operation is preserved by the maps which forget 
orientation or leaf labels. Every binary tree is the root join of two subtrees 
in this way. 

For thin trees ti*t 2 = t 2 * t\, but this is not true in general for fat trees. 
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Figure 3: Joining two trees at the root 



Splits are defined as follows. If t = t\ * ti and U has ni leaves then say 
that the first split of t is the ordered pair (m, n 2 ) (or the unordered pair if t 
is a thin tree). Similarly, each internal node has an associated split, as it is 
a branching point with some number of leaves below and to the left or right. 
The multiset (set with multiplicity) of splits of a tree is useful for calculating 
the probability of a tree under certain classes of self-similar probabilities. See 
Figure |U for an example. 




Figure 4: The splits of a tree 



2.8 The subtree below an edge 

The definition of the subtree below an edge is useful in defining the root join 
operation. 
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Figure 5: The subtree below an edge 



Definition 6 Given an edge e of a rooted tree t, the subtree of t below edge 
e, call it s, is a the rooted tree with 

vertex set V comprising all vertices of t which are descendents of both ends 
ofe, 

edge set comprising all edges oft which have both ends in V and 

root vertex the end of e which is closest to the root ( and so of degree 1 in the 

new tree). 

Furthermore: 

If t is a partially labeled tree then so is s, with labeling function the re- 
striction of the original labeling function to V . Thus every vertex of s is 
labeled exactly as it was in t. 

If t is a fat tree then so is s and the orientation of every vertex of s is the 
same as the orientation of that vertex in t, with the exception of the root of s 
which has orientation the length-one cycle consisting of it's unique neighbor. 
This is the only possible choice of orientation at the root vertex. 

See Figure El for and example. 

Proposition 7 The graph called the subtree of t below edge e is indeed a 
rooted tree 

Proof. 

First, show that the subtree of t below edge e, call it s, is a tree. Let 
e = {vi,V2}, with v\ closer to the root of t, so that the root of s is defined 
to be v\. Any path in s is a path in t and so there is at most one path 
between any two vertices of s. On the other hand, every vertex of s is either 
vi or a descendent of v 2 (ancestors and descendents referring to tree t). Ev- 
ery non self-intersecting path from a descendent of v% to v% passes through 
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descendents of v 2 only. Since all descendents of v 2 lie in s it follows that s 
is connected. Thus s is a tree. Finally, V\ has degree 1 in s, so s is a rooted 
tree. □ 



2.9 Joining two trees at the root 

This section contains the definition of the operation • * • of joining two trees 
at the root. This operation is used extensively in the definitions to come so 
several of its properties are examined in detail. 

Definition 8 Given rooted trees t\ and t 2 , let the root join of t\ and t 2 , 

denoted t\ * t 2 , be the tree defined as follows: 

• // 1\ is an empty tree then t 1 * t 2 = t 2 . If t 2 is an empty tree then 
t± * t 2 — t\ . 

• Otherwise, the tree t±*t 2 includes vertices r, v , v±, v 2 and edges {r, v }, 
{ v 0j v i}> { v o> v 2\, such that r is the root vertex and, for each i in {1, 2} 7 
the subtree of ti *t 2 below edge {v ,Vi} is isomorphic, via f i} to ti. 

Furthermore: 

lft\ and t 2 are leaf-labeled trees then so is ti *t 2 , and the maps f\, f 2 are 
isomorphisms of partially labeled trees. 

lft\ and t 2 are fat (oriented) trees then t\*t 2 is a fat tree, the maps fi, f 2 
are isomorphisms of fat trees, the orientation at r is the cycle (v ) and the 
orientation at vo is the cycle (r,vi,v 2 ). 

Proposition 9 Given rooted trees ti and t 2 as in the previous definition, the 
tree denoted t\ * t 2 exists and is uniquely defined up to isomorphism. 

Proof. 

If t\ or t 2 is the empty tree then ti * t 2 is equal to either t 2 or t\ and so exists 
and is uniquely defined. Suppose now that t\ and t 2 are non-empty trees. 

Assume for the moment that t\ and t 2 are thin, unlabeled rooted trees. 

Let U have vertex set Vi and edge set Ei. Without loss of generality, 
suppose that the vertex sets of t\ and t 2 intersect at a single element, t>o, 
which is the root for both trees. Let r be an element not contained in V\ or 
V 2 . This will represent the root of the new tree. 
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Let V = V 1 UV 2 U{r} and E = E 1 UE 2 U{{v , r}}. Let t be the graph with 
vertex set V and edge set E. Now show that t has the properties required of 
ti * t 2 . 

First, show that t is a tree. The graph t is connected as there is a path 
from every vertex to the vertex vq. Now to show that there is a unique non 
self-intersecting path between any two vertices. Note that t±, t 2 , and the 
graph t 3 with vertex set V 3 = {v Q ,r} and edge set E 3 = {{v ,r}} are all 
trees. 

Any edge from a vertex in V, to a vertex in Vj, with % ^ j must contain 
vertex vq. Thus, any path from a vertex in Vi to a vertex in Vj, with i ^ j 
must pass through v . Thus, any non self-intersecting path from a vertex, x, 
in Vi to a vertex, y, in Vj must contain Vq exactly once, with all vertices in 
the path before vq lying in Vi and all those after vq lying in Vj. The sub-path 
from x to t>o in non self-intersecting and lies entirely in Vi and so is unique, 
since (Vi, Ei) is a tree. Similarly with the sub-path from t> to y. Thus the 
non self-intersecting path from x to y must be unique. 

Any non self-intersecting path from a vertex x to y, both in V^, must lie 
entirely in Vi. Otherwise, if v is any vertex in the path not lying in Vi (and so 
not equal to vq) then the sub-path from x to v passes through v o as does the 
sub-path from v to y. Thus t>o appears twice on a non self-intersecting path, 
which is a contradiction. Therefore, since the non self-intersecting path from 
x to y lies entirely in Vi is must be unique, since (Vi, Ei) is a tree. 

Now show that t has the required properties. First, It contains the re- 
quired vertices, r,vo,vi,v 2 , and edges, {r, t> , },{fo, f i},{fo, ^2}, which are ex- 
plicitly stated. Second, by the construction of t, for each % — 1, 2, the subtree 
of t below edge {i>o,fi} has vertex set \^ and edge set Ei and therefore is 
isomorphic to tj. 

Furthermore if t\ and £2 are fat rooted trees, with orientation functions 0\ 
an o 2 , then let t have orientation function o defined by o(r) = (vq), o(vq) = 
(r,Vo,Vi) and o(v) = Oi(v) for v G Vi 

{vq}. Thus t satisfies the additional requirements on the orientation of t\ *t 2 . 

Note that vertex v is a (non-root) leaf of t if and only if v is a leaf of 
either t\ or t 2 . 

Furthermore, if t\ and t 2 are leaf-labeled trees then let t be a leaf-labeled 
tree such that the leaf v G Vi C V of t has the same label both as a vertex of t 
and of ti. All other vertices of t are unlabeled. Thus t satisfies the additional 
requirements on the labeling of t\ *t 2 . 
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Next show that any two trees satisfying the definition of t\ * t 2 must 
be isomorphic. Again, begin by assuming simply that t\ and t 2 are thin 
unlabeled rooted trees. 

Let si and s 2 be trees which satisfy the requirements of t\ *t 2 . Therefore, 
Si contains vertices n, v i0 , v a , v i2 and edges {n, v i0 }, {v i0 , v a }, {v i0 , v i2 }, such 
that r\ is the root vertex of Sj and, for each j in {1,2}, the subtree of t\ * t 2 
below edge {vio,Vij} is isomorphic, via fy, to tj. 

Let / be a map from the vertex set of s\ to the vertex set of s 2 defined 
such that /(ri) = r 2 , f(vij) = v 2 j for j = 0,1,2 and if v is a descendent 
of vij then f(v) = f 2 ^fij{v). This map is a bijection on vertices, sends the 
root to the root, and maps edges to edges, as does its inverse. Thus is it an 
isomorphism of thin rooted trees. 

If t\ and t 2 are both fat trees, or both leaf-labeled trees, then / is also an 
isomorphism of, respectively, fat trees or leaf-labeled trees. 

Thus there is exactly one tree, up to isomorphism, satisfying the require- 
ments of t± * t 2 . □ 

Note that the sum of the number of leaves in two rooted trees is the same 
as the number of leaves in the root join of these two trees. In other words 
\ti * t 2 \ — \ti \ + \t 2 \. 

Also, note that forgetting orientations before or after joining two trees at 
the root has the same effect. The same is true for forgetting leaf-labelings. 
Since this result is used often, it deserves a proposition. 

Proposition 10 If t\ and t 2 are rooted binary trees which are both fat then 
F (ti *t 2 ) = F (ti) *F Q (t 2 ), and ift\ and t 2 are rooted binary trees which are 
both leaf-labeled then Fi(tx *t 2 ) = Fi(ti) * Fi(t 2 ) 

Proof. 

This follows directly from the definitions for the binary operator • * • and 
the forgetful maps F , which forgets orientations of fat trees, and Fi which 
forgets labelings of labeled trees. □ 

The following result shows that there is only one way, up to isomorphism, 
to write a binary tree as the root join of two non-empty trees. 

Lemma 11 If t is a non-empty rooted binary tree then: either t has one 
(non-root) leaf; or t — ti * t 2 for a unique pair of non-empty trees {ti,t 2 }, 
and if t is a fat tree then there is a unique ordered pair (ti,t 2 ) such that 
t — t± * t 2 . 
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Proof. 

Suppose that £ is a non-empty rooted binary has more than 1 leaf. Therefore 
£ has a root, r, which has a unique neighbor, Vq. This vertex, vq, has degree 
3 and so has two distinct neighbors, v\ and t> 2 , which are not the root r. lit 
is a fat tree then choose t>i,t> 2 so that the orientation at t> is (r, t>i,t> 2 ). For 
i — 1, 2, let £j be the subtree of £ below edge {w , ^i}- Thus the tree t\ * t 2 is 
exactly the tree £ (provided that the root vertex of t\ * t 2 is chosen to be the 
same element as the root vertex of £). 

Suppose that £ is also equal to t 3 *t^. By the definition of the operation •*•, 
£ 3 is isomorphic to the subtree of £ below edge {vo, Vi} for some j G {1,2} and 
£ 4 is isomorphic to the subtree of £ below the other edge {v ,Vj}, j G {1,2} 
such that i ^ j. Thus £3 is isomorphic to one of £1 or £ 2 , and £4 is isomorphic 
to the other. 

Furthermore, if £ is a fat tree then £ = £3 * £ 4 implies that £3 is isomorphic 
to the subtree of £ below edge {t>o, v 1}, which is £1, and so the ordering of the 
two trees is also unique. □ 



2.10 Splits 

Now for the formal definition of the splits of a tree. First, if £ is a binary 
rooted tree then let |£| denote the number of leaves of £, also called the size 
of £. 

Definition 12 Suppose that t = £1 * t 2 for non-trivial fat, respectively thin, 
rooted binary trees £1 and £ 2 . Say that t has first split (|£i|, |£ 2 |), respectively 

{\ti\Ah\}. 

Lemma ITT1 ensures that the first split is well defined. 

Definition 13 Define the family of splits of a fat (respectively thin) rooted 
binary tree t inductively as follows: 

splits(t) is a multiset (a set with multiplicities) such that 

• If t is a one- leaf tree then splits(t) = 

• If t = ti* t 2 , for non-trivial £1, £ 2 , then 

splits{t) = sp/z£s(£ 2 )Usp/z£s(£ 2 )U{(|£i|, |£ 2 |)} for fat trees, and 
splits(t) = sp/z£s(£ 2 )Usp/z£s(£ 2 )U{{|£i|, |£ 2 |}} for thin trees. 
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Again, Lemma ITT1 ensures that this is well defined. 
An equivalent non-recursive definition is: 

Definition 14 Given a rooted binary tree t, let E 2 be the set of edges which 
do not contain a leaf. Define the multiset of splits of the tree t to be the union 
over edges e G E 2 of the first split of the subtree of t below e. 

Equivalence of these definitions is not proven here. 

3 The alpha models 

Now that the requisite constructions and definitions are at hand, the alpha 
models may be defined. 

The alpha models are four parameterized sequences of probability mea- 
sures, one for each of the four types of trees focused on here: tree shapes, fat 
tree shapes, cladograms and fat cladograms. For each type of tree, the n-th 
element of the corresponding sequence is a probability measure on the set of 
trees of that type with exactly n leaves. Each alpha model has a single real 
parameter a G [0, 1] . 

Each of the four sequences is constructed in a similar manner to the 
others, using successive alpha insertions to build up each probability mea- 
sure. The four are related through the maps which forget orientation and 
leaf-labels. Each also has two interesting properties, called Markovian self- 
similarity and deletion stability (also called sampling consistency). These two 
properties are briefly mentioned below and properly defined in the following 
sections. 

The alpha model on cladograms is perhaps of most practical interest. It 
is also representative of all four models, and is now described. 

Alpha insertion of a leaf labeled k into a cladogram is performed as fol- 
lows. Give each leaf edge weight 1 — a and all other edges weight a. Choose 
an edge at random according to these weights and attach a new leaf edge to 
the middle of this edge. Label the newly created leaf k. 

A random cladogram with n leaves from this model may be constructed as 
follows. Take a rooted tree with a single leaf and label this leaf 1. Successively 
insert leaves labeled 2,3, ... ,n into the tree according to the alpha insertion 
rule. Once all of the leaves have been inserted, apply a uniform random 
permutation to the leaf labels. Thus, the resulting distribution on cladograms 
is symmetric under permutation of leaf labels. 
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3 12 4 5 

Figure 6: The weight of a leaf edge is 1 — a, the weight of an internal edge 
is a 




3 6 1 2 4 5 



Figure 7: The resulting cladogram with weights after inserting into the high- 
lighted edge in Figured 
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It is also deletion stable in the sense that if a random leaf is deleted 
(without loss of generality, the leaf with the largest label), then the resulting 
smaller tree is also distributed according to the alpha model with the same 
value of a. This implies sampling consistency: Given a random cladogram 
from the alpha distribution on cladograms with n > k leaves, the shape of the 
subtree spanned by leaves 1,2, ... ,k is distributed as the alpha distribution 
on cladograms with k leaves. In the case of unlabeled trees, a subtree spanned 
by k randomly chosen leaves is distributed as an unlabeled alpha tree with 
k leaves. Deletion stability is described in more detail in Section 13.71 

Another nice property of the alpha model is that it is Markovian self- 
similar (also called Markov branching) . This means that if the subtree below 
any edge has k leaves, then the shape of this subtree is distributed as the 
alpha model on k leaf trees, and is independent from the shape of the rest of 
the tree (conditional on there being exactly k leaves below the given edge). 
This is covered in Section 13.31 in particular Proposition [23 

Note that neither Markovian self-similarity nor sampling consistency (dele- 
tion stability) implies the other. 

The alpha models are also the stationary distributions of certain Markov 
chains (one for each model). These Markov chains 'project' onto each other 
via the forgetful maps. These will be examined in later work. 

In the formal definitions and proofs to follow, a recursive definition of 
alpha insertion is used. Notice that if there are n = n\ + n 2 leaves total 
and ni leaves below one side of the first branch point then the probability 
that the new leaf is inserted in some edge down that branch is ni ~ a the 
probability that it is inserted in some edge down the other side is n2 ~ a and 
the probability that it is inserted at the root edge is This observation 
is the basis of the recursive definition of alpha insertion. 

3.1 Recursive definitions of alpha insertion 

The recursive definitions of alpha insertion for each type of tree are given 
below. 

For the remainder of this section, let s denote the one leaf binary rooted 
tree (tree shape), which is fat or thin as required by the context. This tree 
has two vertices and a single edge from the root vertex to the non-root leaf. 
Let s x be the one leaf binary rooted tree with leaf labeled x. 

Let \t\ denote the number of leaves of a binary rooted tree t. 



21 



Definition 15 Lett be a fat binary rooted tree (fat tree shape). Define i a (t) 
as follows. If t has one leaf then define i a {t) = |(t * s + s * t). If not, then 
t — ti * t 2 f or unique non-trivial t\ and t 2 . In this case define 

I ti I — ck I £2 1 — ct a 1 

la{t) = —, l a (ti) *t 2 + —, U * l a (t 2 ) + 7- -(S *t + t*S) 

\t\ — a \t\ — a \t\ — a I 

Definition 16 If t is a thin binary rooted tree then i a (t) is given by exactly 
the same formulae as in the case of a fat tree. 

Uniqueness of the unordered pair {ti, t 2 } and commutativity of the root join 
operation on thin trees ensures that i a is well defined in this case. 

Definition 17 Let i a>x be defined for (fat or thin) leaf labeled rooted trees 
identically to i a with the exception that the unlabeled single leaf tree s is 
replaced everywhere with the labeled single leaf tree s x . 

Proposition 18 Alpha insertion commutes with forgetting orientation or 
leaf labels. 

In other words, if Fi is the function forgets leaf labels and F is the function 
which forget orientations then 

Fi{i a>x {t)) = i a (Fi{t)) 

F (i a , x (t)) = i a , x (Fo(t)) 

F (i a (t)) = i a (F (t)) 

for trees t of the appropriate type. 
Proof. 

That alpha insertion commutes with forgetting orientations, F Q , follows 
directly from the definitions. The case of forgetting leaf labels follows by a 
simple induction. 

For the initial case, Fi(s x ) = s and the root join operation is preserved 
by the map which forgets labels (Proposition ITU]) . Thus, if t is a single leaf 
tree then F,(i a)!8 (t)) = F,(±(t *s x + s x *t) = * Fi(s x ) + F^) * F,(t)) = 

l(F l (t)*s + s*F l (t))=i a (F l (t). 

For the inductive step, if t is not a single-leaf tree then t — t\ * t 2 for 
non-trivial trees ti,t 2 . Assume the statement is true for all trees smaller 
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than t, with fewer leaves that is. As Fi respects * it follows that Fi(i a>x (t)) 
is equal to 

By the inductive assumption this is equal to 

TTf ia {Fi (t i ) ) *F, (t 2 ) + ^ F; (t x ) *i a (F, (f 2 ) ) + -- (s*F, (*) +F, (0 *s) ) 

which is equal to i a (Fi(t)) as desired, since F t (t) = Fi(t 1 *t 2 ) = i^(ii) * F(£ 2 ) ■ 
□ 



3.2 Definitions of the alpha models 

The definitions of the alpha models for each of the four classes of trees are very 
similar. Each involves successive alpha insertions, of labeled or unlabeled 
leaves, and then a final uniform randomization of leaf labels in the labeled 

cases. 

The definitions of the alpha models depend upon a single variable, usually 
called alpha or a, which lies in the range [0,1]. Assume throughout that a 
is some fixed number. 

Let u n = J2aes n ° ^ e ^ ne uniform probability measure on the permu- 
tations of [n] = {1,2,..., n}. 

Definition 19 The alpha model on fat cladograms is a sequence of prob- 
ability measures {P n )™ = i, such that Pi is a probability measure on the set 
of fat cladograms with n leaves, P is the unique measure on the single fat 
cladogram with zero leaves (the empty tree), and for all integers n>l 

Pfi U n i a>n • • • ia,2^a,lPo 

In other words, since i a ,\Po = Pi is the unique measure on the single leaf 
tree, this definition says: start with the single leaf tree with leaf labeled 1, 
alpha insert leaves labeled 2 up to n and then randomly permute the leaf 
labels. 
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Proposition 20 If(P n )^ = i is the alpha model on fat cladograms then for all 
integers n > 1 

Pn u n i ari P n —i 

Proof. 

The right hand side of the equation is equal to u n i a ,nU n -iia,n-i • • 'ia,iPo- 
Since alpha insertion does not depend on the position of the labels of t, it fol- 
lows that i a ,nU n -it = cri ajn t, where a is the image of u n -\ under the usual in- 
jection of S n -i into S n . Thus the right hand side is equal to u n ai a ,ni a ,n-i • • • ia,\Po 
which is equal to P n for any permutation a. □ 

Define the alpha model on thin cladograms, fat tree shapes, and thin 
tree shapes to be the image of the alpha model on fat cladograms under 
the appropriate forgetful maps, F Q and Fi which forget orientations and leaf- 
labels respectively. Specifically: 

Definition 21 If{Pi)^Zi i s ^ e alpha model on fat cladograms, Fi is the func- 
tion which forgets leaf labels and F Q is the function which forgets orientations 
then define: 

(F (Pj))^ 1 to be the alpha model on cladograms. 
(Fi(Pi))jl 1 to be the alpha model on rooted binary fat trees. 
(F Fi(Pi))^l 1 = [FiF (Pi))^. 1 to be the alpha model on rooted binary trees. 

Proposition 22 // (Pj)^ 1 is the alpha model on cladograms then 

Pn U n i a n P n —i U n i a ^n ' ' ' ia,lPo 

Proposition 23 // (-Pi)^i is the alpha model on (fat or thin) tree shapes 
then: 

Pn iaPn—1 ' ' ' Po 



Proof. 

Both of these propositions follow immediately from the previous two defi- 
nition, and the fact that alpha insertion and the forgetful maps F Q and Fi 
'commute' ( Proposition ITSJl . and that Fi(at) = Fi(t) for any fat or thin clado- 
gram t and any permutation, a, of leaf labels. □ 
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3.3 Markovian self-similarity 



Markovian self-similarity basically means that the subtree below any edge is 
picked from the distribution on trees of the correct size, independently of the 
rest of the tree. It is also called Markov branching by Aldous in 4|, as each 
branching happens independently of those above or on other paths from the 
root. 

Definition 24 Let (P n )'^ ) =1 be a sequence of probability measures where Pi 
is a probability on (fat) rooted binary trees with n leaves. Say that (P n )iSi 
Markovian self-similar if there exist real numbers q(a, b) > 0, for all integers 
a, b > 1, such that, for all integers n > 2, Y^m=i q{fn, n — m) = 1 and 



In other words, the trees below each child of the first branch-point are dis- 
tributed independently from the same sequence of probabilities, conditional 
on the number of leaves they each have. 

Call q(-, ■) the conditional split distribution of {P n )^Li- 

Must q be unique? 

Proposition 25 Suppose that such a q exists, then in the case of fat trees q 
is unique, and in the case of thin trees there is a unique symmetric q. 



In the case of fat rooted binary trees, t\ * t2 = £3 * £4 if and only if t\ = £3 and 
£2 = £4- In particular, the number of leaves of t\ is equal to the number of 
leaves of £ 3 , so Y^n=i <?i (m, n-m)P m * P n ^ m = Y^n=i <?2 {m, n-m)P m * P n _ m 
if and only if qi = q 2 . 

In the case of thin rooted binary trees, t\ * t 2 = £3 * £4 if and only if t\ = t% 
and t 2 = £4, or t\ = £4 and t 2 = £3. In particular, the number of leaves of ti 
is equal to the number of leaves of either t 3 or t 4 , so Ylm=i <?i( m > n ~ m )Pm * 
Pn-m = El=i Q2{m, n - m)P m * P„_ m if and only if g x (m, n - m) + q x {n - 
m, m) = q 2 (m, n — m) + q 2 (n — m, m). 

Thus in the case of thin rooted binary trees there is a unique q such that 
q(a,b) = q(b,a). □ 

Conditional split distributions for thin trees are henceforth assumed to 
be symmetric in this way unless otherwise stated. 



n-l 




m=l 



Proof. 
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Definition 26 Similarly, a sequence (P n )^ =1 of probability measures on (fat/thin) 
cladograms is called Markovian self-similar when the corresponding sequence 
on unlabeled trees, (Fi(P n ))^ =l , is Markovian self-similar. 

3.4 Markovian self-similarity of the alpha models 

Define r a (ri) = (n — 1 — a)(n — 2 — a) ■ ■ ■ (2 — a)(l — a) with r a (l) = 1. 
Thus To is the usual gamma function on the integers. 

Lemma 27 The four alpha models are all Markovian self-similar with the 
same conditional split distribution: 



By Definition I2T1 and Proposition EH it suffices to prove that the alpha model 
on fat tree shapes is Markovian self-similar with the specified split distribu- 
tion. 

First, use induction to show that the first split of the alpha model on fat 
trees is distributed according to q a . 

Recall that P n +i = iaPn (Proposition |20J), and that if t\ has a leaves and 
^2 has b leaves then tree t\ * t 2 has first split (a, b) fDefinition ll2j) . 

By the formula for alpha insertion, i a , (Definition 115)) . if t is a fat tree 
shape with first split (a, b) then i a t is a fat tree with first split: 

• (a + 1, b) with probability 

• (a, b + 1) with probability ^ L 

• (1, a + b) with probability ^ n " a ^ \ 

• (a + b, 1) with probability \ 

To start the induction, note that for n = 2 there is only one fat tree, and 
q a (l, 1) = 1 as it should. 

Next, suppose t is a random fat tree shape with n leaves. Show that if 
the first split, (a, n — a), of t is distributed as g Q (a, n — a) then the first split 
of i a t is distributed as q a (a, n + 1 — a). 



q a (a,b) 



r a (a)r a (6) / a fa + b 
T a (a + b) \2\ a 




Proof. 
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In other words, show that q a satisfies the following equations: 

q a (l, 1) = 1 , and for all a, b > 1: 

a— 1— a 6— 1— a 

g a (a,6j = g a (o-l,6j — — — \-q a {a,b-l) 



a+b—l—a a+b—l—a 
, a/2 , , N 6 — 1 — a 

?«(!,& = r-^ + ffa 1,6-1-7 3 

o — a o — a 

a/2 a — 1 — a 

g Q (a, 1) = hg a (a-l,l) 

a — a a — a 

This computation is omitted. 

Thus, by induction, the first split of the alpha model satisfies q a . 

Next, to show that the alpha model is Markovian self-similar. In other 
words, show that if (P n )^ is the alpha model on fat tree shapes then 

n-1 



m=l 



This equation is true for n — 2. Suppose it is true for some n, then P n+ i = 
i a P n = E^i <la(rn, n - m)i a (P m * P n - m ) 

Recall that if t — t\ * t 2 then i a (t) = ^ITq ia(ti) * h + m-a ^ * ^a(h) + 
s *t + t * s) (Definition EJ) It follows that i a (P m * Pn-m) is equal to 



a 1 



\t\-a 2 

171 ~ a r> p n — m — a a 1 , 

Pm+l * Pn-m H Pm * "n-m+1 H 77 1 * n + -T ra * -T 1 J 

n — a n — a n — a 2 

Thus P n +i is a linear combination of terms of the form P m * P n+ i_ m and 
so is equal to Ylm=i <l(. m , n + 1 — fn)P m * Pn-m for some q. Since q is the 
distribution of the first split, by the arguement above it must be equal to q a . 
Thus, the inductive step holds and the proposition is proven. □ 

The proof of lemma 12*7] whilst perfectly correct, gives no indication as to 
how the formula was derived in the first place. One possible derivation of 
the formula is sketched in the discussion below. 

Discussion: First, recall the recurrence relations for the conditional 
split distributions q a , shown in Equation set [""J 

Next, use a network flow argument to find a closed form solution for the 
recurrence equations. 
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Figure 8: A flow network for the split distribution of the alpha model 



Think of the above triangular diagram (Figure [HJ) as a directed flow net- 
work, with as yet unspecified sources and sinks. The labels on each edge are 
the multiplying factor applied to the flow out of the starting vertex before it 
is added to the ending vertex. Now choose the sources/sinks so that the net 
flow through each vertex (a, b), for a, b > 1, is the conditional probability of 
the first split being (a, b) when the tree has a + b leaves total. 

Notice that if this is true for one line (1, n — 1), . . . , (n — 1, 1), then the 
contributions to the next line will be just as in Equations El except for the 
contribution to (1,72+1) and (n + 1, 1). This missing contribution should 
come from a flow of a/2 through (0, n) and (n, 0). 

This implies that the node (0, 0) should be a source with inflow ~, so that 
the flows from (0,n) to (l,n) and (n, 0) to (n, 1) are ~ times as needed. 
This then implies that the node (1,1) must be a source with enough inflow 
so that the total inflow is 1, since q a (l, 1) = 1. Thus it must be a source 
with inflow ^Er 2 • By the argument above, these are all the sources needed. 
Figure El shows the network with the total flow into each node. 

Finally, notice that all paths between any two nodes have the same prod- 
uct. A path from (0, 0) to (a, b) has weight p^^j) , and one from (1, 1) has 

weight T ^ a }a+b) ) (1 — a )- Also, n °t e that there are ( a ~^ b ) possible paths from 
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Figure 9: The flow at each node of the network 



(0,0) to (a, b) and ( a +!; 2 ) possible paths from (1, 1) to (a, b). 

Summing the inflow by path and source now gives the stated formula for 

A 



3.5 Calculating the probability of a tree 

This section gives a simple method for calculating the probability of a tree 
under a Markovian self-similar sequence of probabilities, such as the alpha 
model. Examples of the probabilities of small tree shapes are worked out. 

First consider fat tree shapes (unlabeled rooted binary fat trees). 
Proposition 28 Suppose that (Pi)i is a sequence of probabilities on fat tree 
shapes which is Markovian self-similar, with conditional split distributions 
given by q. If t is a tree with n leaves whose family of splits is F then 

Pn(t)= J] q(a,b) 

{a,b)eF 

Proof. 
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The statement is true for the single tree with one leaf, and for the single 
tree with two leaves. For the pedantic, when n = the empty product is 1 
which is equal to the probability of the empty tree. 

If the tree t has at least 2 leaves it may be written as t = t\ * t 2 
and so F = splits(t) = {(ai,fei)} U splits(ti) U splits^) (as a union of 
multisets). The probability that random tree t' has first split (ai,&i) is 
g(di,&i). Conditional on this, the probability that t' — t — t\ * i 2 is 
P a (ti)Pb(t2) (by the definition of Markovian self-similarity). By induction 

this is ri(o,6)6BpUta(ti) V( a > 6 ) r[(o,b)6BpUta(tx) ?( a > 6 )' Thus the Probability of tree 
t is P n {t) = U {a ^ F q(a, b) as desired. □ 

If q is a split distribution, then define q{a, b} = q(a, b) + q(b, a) if a ^ b 
and q{a, a} = q(a, a). 

Proposition 29 Suppose that (Pj), is a sequence of probabilities on thin tree 
shapes which is Markovian self-similar, with conditional split distributions 
given by q. If t is an unlabeled thin rooted binary tree with n leaves whose 
family of splits is F then 

Pn(t)= n <H a ' 6 > 

(o,6)eF 

Proof. 

This proof is almost identical to that above. In this case the probability that 
t has first split {a, b} is q{a, b}. □ 

For fat cladograms (labeled fat rooted trees): 

Corollary 30 Suppose that (Pi)i is a Markovian self-similar sequence of 
probabilities on fat cladograms such that if Fi(ti) = F/(t 2 ) then P n (ti) = 
Pn{t2)- (I n other words, any two cladograms with the same shape have the 
same probability.) Then if t is a fat cladogram with n leaves: 

p n(t)=-< n 

Proof. 

By Definition 1261 the family (F/(Pj))j is a sequence of Markovian-self-similar 
probabilities on fat tree shapes (unlabeled fat rooted binary trees). Since 
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Fi(t\) = Fi(t 2 ) implies P n (ti) = Pnih) and the pre-image of any fat tree 
shape under F\ has size n\ it follows that P n (t) = ^Fi(P n )(Fi(t)). Since the 
map Fi does not change the family of splits of a tree it now follows that 
Pn(t) = ^ n (o ,6) 6S piit s (t) <?K & )' as desired. □ 

Before proceeding to the case of cladograms, a lemma is needed. Say that 
a branch point is symmetric if the subtrees below each child edge are equal 
to each other. See Figure for example. 




Figure 10: The first branch point of this tree is symmetric 



Lemma 31 If t is a tree shape with n leaves and k symmetric branch points 
then the number of cladograms with shape t is equal to n\/2 h . 

Proof. 

The symmetric group on [n] = {1,2, . . . ,n} acts transitively on the set of 
cladograms with shape t. The aim is now to show that the number of per- 
mutations which fix any cladogram with shape t is 2 k . The lemma follows 
immediately from this. 

Proceed by induction. The statement to be proven is that a rooted binary 
tree with distinctly labeled leaves and k symmetric branch points is fixed by 
2 h permutations of its leaf labeling set. 

First, this is trivially true for a labeled tree with 1 leaf. 

Suppose that the lemma is true for all trees smaller than t. 
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Suppose that the first branch-point of t is not symmetrical (the easy case). 
Then t = t% * t 2 for distinct ti and t 2 and so any permutation which fixes t 
must fix the set of leaves of t\ and the set of leaves of t 2 . Thus the group 
fixing t is the direct product of the group fixing ti and the group fixing t 2 . 
If t\ has k\ symmetric branch points and t 2 has k 2 symmetric branch points 
then t has k = ki + k 2 symmetric branch points. Therefore, by the inductive 
assumption, the group fixing t has size 2 k = 2 kl 2 k2 . 

Suppose that the first branch point of t is symmetrical (the hard case). 
Then t = t\*t 2 where t\ and t 2 have the same shape. As they have isomorphic 
shapes, ti and t 2 both have the same number of symmetric branch points, 
say k\. Thus t has k — 1 + 2ki symmetric branch points. 

Now, every permutation which fixes t must fix the unordered partitioning 
of leaf labels into those of one subtree and those of the other. Thus, any 
permutation which fixes t must either swap the two parts or not. In each 
case, by the inductive assumption there are then 2 kl distinct ways to permute 
the elements of each part without changing the cladogram. Thus the order 
of the group fixing t is 2 x 2 kl 2 kl = 2 k as desired. □ 

Appendix [7| contains a list of all tree shapes with up to 7 leaves, along 
with the number of cladograms of each shape. 
So finally: 

Proposition 32 Suppose that (Pi)i is a Markovian self-similar sequence of 
probabilities on cladograms with split distribution q such that if Fi(t\) = Fi(t 2 ) 
then P n (ti) = P n {t 2 ). (In other words, any two cladograms with the same 
shape have the same probability.) Then if t is a cladogram with n leaves and 
k symmetric branch points: 

p n(t)=^ n <H a ' 6 > 

(afi)dsplits(t) 

Proof. 

Similarly to the previous proof: by Definition I2TH the sequence (_F)(Pj))j is 
a sequence of Markovian self-similar probabilities on tree shapes (unlabeled 
rooted binary trees). By Lemma I3~T1 the number of cladograms with the same 
shape as t (ie such that Fi(t') = Fi(t)) is |jj where k is the number of equal 
splits of t. 

Since Fi(ti) = Fi{t 2 ) implies P n (ti) = P n (t 2 ) and the pre-image of Fi(t) 
under has size $ it follows that P n {t) = | f F / (P n )(F ; (i)). Since the map 
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Fi preserves the family of splits of a tree and the split distribution of a 
Markovian self-similar sequence, this gives: P n [t) = ll(a,b)espiits(t) 

q{a,b} 

as desired. □ 



3.6 The probability of a tree under the alpha model 

There is little more to say in the special case of the alpha model. Since the 
four alpha models satisfy the conditions, respectively, of Propositions EEl EE 
|3~U1 and EU the probability of a tree (of the appropriate type) under one of 
these models is given by the formulae in those propositions. 



Figure 11: A tree shape with probability 



2(l-q)(8-a) 
(5-a)(4-a) 



For example, the tree shape in Figure El has family of splits 

{{4,2}, {1,1}, {1,3}, {2,1}, {1,1}} 

Therefore, by Proposition |2H1 the probability of this tree shape under the 
alpha model on tree shapes is 

IJ q{a, b} 

(a,6)e{{4,2},{l,l},{l,3},{2,l},{l,l}} 

Using Equation |21 for the split distribution of the alpha model, and recalling 
that q{a, b} = q(a, b) + q(b, a) if a ^ b and q{a, a} = q(a, a), this is equal to: 

(l-q)(8-q) 2 

^7 X 1 X X 1 X 1 

(5 — a) [A — a) 3 — a 
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which simplifies to 

2(l-a)(8-a) 
(5-a)(4-a)(3-a) 

Appendix [7| contains a list of all tree shapes with up to 7 leaves, along 
with the probability of each under the alpha model. 

3.7 Deletion stability 

This section addresses the definition of deletion stability, and provides proofs 
that the alpha models have this property. 

Informally, deletion stability on a sequence of probabilities (Pi)fl on fat 
or thin cladograms means that picking a random cladogram with n leaves 
from P n and deleting leaf n gives a random cladogram with n — 1 leaves 
distributed as P n -\. Similarly, a sequence of probabilities (-Pj)^o on ^ or 
thin tree shapes is deletion stable if picking a random tree shape with n 
leaves from P n and deleting a random leaf gives a random tree shape with 
n — 1 leaves distributed as P n -\. 

The formal definition of deletion stability requires a formal definition of 
these deletions. 

Let D be the function which deletes a random leaf of a binary rooted 
tree. A recursive definition of D is given as this form is most convenient for 
the proofs which follow. 

For a tree shape or cladogram t, let \t\ denote the number of leaves of t, 
also called the size of t. 

Definition 33 Let t be a fat or thin tree shape. 

• Iftis the empty tree then so is D{t). 

• Ift has one leaf then D(t) is the empty tree shape. 

• Ift has more than one leaf then t = t\ * t 2 for non-empty t\,ti, so let 

I El I T \l2\ I til i |C2| 

Proposition 34 TTie function D is well defined. 
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Proof. 

If t is a fat tree with more than one leaf then, by Lemma t = t\ * t 2 
for a unique pair of non-empty trees, (t\,t 2 ). Thus D{t) is well defined. If 
t is a thin tree then, by Lemma ^2 t = ti * t 2 = t 2 * t\ for a unique set 
of two non-empty trees, {tx,t 2 }. Since the root join operation is commuta- 
tive for thin tree shapes D(t 1 * t 2 ) = itJ+Li -^foi) * *2 + [tTCTO ^i * D(t 2 ) = 
^ t2 \ D(t 2 )*t 1 + | fl |+| t2 | *2 *D(t 1 ) =D(t2*h) and so D(t) is well defined. □ 

The next obvious result is that the operation D is respected by the map, 
F , which forgets vertex orientations: taking fat tree shapes to thin tree 
shapes. 

Proposition 35 Ift is a fat tree shape then D(F (t)) = F (D(t)). 
Proof. 

If t is a fat tree shape with one leaf then F (t) is a thin tree shape with one 
leaf, D(t) is the empty fat tree shape, and so both F (D(t)) and D(F (t)) 
are the empty thin tree shape. 

Suppose that t has more than one leaf and that the statement is true for 
all fat tree shapes with fewer leaves than t. Now by Lemma ^2 t = t\* t 2 
for a unique pair of non-empty trees (ti,^) and by Proposition ITU1 FJt) = 
F (t 1 )*F (t 2 ). Thus 





l*ll 






+ 


1*2! 





1*2 1 




I*ll 


+ 


l*2| 



F (D(t)) = F - . D(h) * t 2 + - ' r ti * D(t 2 



F {D{h)) * F {t 2 ) + r F (h) * F (D(t 2 )) 

El + t2 









l*ll 


+ 


1*2! 



This is equal to D(F (t)), since by the inductive assumption F (D(ti)) = 
D(F (tt)) and F (D{t 2 )) = D(F (t 2 )), and the map F Q leaves the number of 
leaves of a tree unchanged. The result follows by induction. □ 

Now for the case of labeled trees. The following is the definition of a 
function, D x , which deletes every leaf labeled x. 

Definition 36 Let t be a fat or thin labeled binary rooted tree. 

• If t is the empty tree then D x {t) is also the empty tree. 
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In the case where t has one leaf: if this leaf is labeled x then D x (t) 
the empty tree, otherwise D x (t) = t. 



is 



• In the case where t has more than one leaf: t = t\ * t 2 so define: 
D x {t) = D x {t x )*D x {t 2 ). 

Again, Lemma ^2 guarantees that D x is well defined. 

It is now shown that, for a random tree picked from a probability on 
cladograms invariant under permutation of leaf labels (such as the alpha 
model on cladograms), first deleting a specified leaf and then forgetting leaf 
lables is the same as first forgetting leaf labels and then deleting a random 
leaf. 

Proposition 37 If P is a probability on leaf-labeled (fat or thin) rooted bi- 
nary trees with n leaves which is invariant under any permutation of its leaf 
labels and such that all trees with positive probability have every leaf uniquely 
labeled and have a leaf labeled x, then Fi(D x (P)) = D(Fi(P)). 

Proof. 

First for the fat case. If n = 1 then a random tree t from P has one leaf, 
and this leaf is labeled x and D x {t) is the empty tree. Thus Fi(D x (P)) = 
D(Fi(P)), the unique probability on the set of tree shapes with leaves, ie 
the empty tree. 

Suppose that the result is true for all trees with less than n leaves. 

Let t be a random tree picked from P, conditioned such that t has first 
split (a, b) (respectively {a, b} in the thin case). This implies that t — t\ * t 2 
for a random pair of trees (tx,t 2 ) such that ^1 = a and l^l = b. Now 
D x (t) = D x (ti) * D x (t 2 ). The leaf labeled x is a leaf of tx with probability 
i t J+Ln ari d in this case x is not a leaf of t 2 and so D x {t) = D x (ti) * t 2 . With 
probability i^lu^i ? the leaf labeled x is a leaf of t 2 and not of t\ and in this 
case D x (t) = tx * D x (t 2 ). 

Note that, conditional on the set of leaf labels they have, tx and t 2 are each 
random trees which are invariant under any permutation of their respective 
leaf labels. Thus, applying Fi to t and using the inductive assumption shows 
that D(t) = jJ^Dfa) * t 2 + jM^x * D(t 2 ) as desired. 

Combining these conditioned results over all possible first splits gives the 
desired inductive step. 
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The thin case now follows by commutativity of the forgetful functions , 
Proposition IHB1 and the symmetry of Definition 1361 □ 



Definition 38 A sequence of probabilities (P n )^Lo; su °h that P n is a prob- 
ability on (fat or thin) cladograms with n leaves, is called deletion stable if 
D n (P n ) = P n -i for alln>\. 

Definition 39 A sequence of probabilities (Pn)^L > such that P n is a prob- 
ability on (fat or thin) tree shapes with n leaves, is called deletion stable if 
D(P n ) = P n _! for alln>l. 

Corollary 40 // {P n )™=o is a sequence, such that P n is a probability on (fat 
or thin) cladograms with n leaves, which is deletion stable and invariant 
under permutation of leaf labels then the sequence (Fi(P n ))^L of probabilities 
on (fat or thin) tree shapes is deletion stable. 

Proof. 

This follows directly from the previous two definitions and Proposition EH □ 



3.8 Deletion stability and conditional split probabili- 
ties 

In the case of Markovian self-similar probabilities, deletion stability is equiv- 
alent to the conditional split distribution satisfying a simple 'consistency 
condition'. This condition is used in the next section to show that the alpha 
models are deletion stable. 

Recall that if q is a conditional split probability then it must satisfy 
Y^m=i Q(, m i n — m) = 1 for all integers n > 2. 

Proposition 41 Let S = (P n )^ =0 be a sequence, such that P n a probability 
on (fat or thin) tree shapes with n leaves, or a sequence such that P n is a 
probability on (fat or thin) cladograms which is invariant under permutations 
of leaf labels. If S is Markovian self-similar then it is deletion stable if and 
only if it has a conditional split distribution q satisfying q(x,y) = 

x+y+l 

for all integers x,y > 1. 
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Proof. 

First to reduce the cladogram cases to the tree shape cases. If S — (P n )^L 
is a sequence on fat or thin cladograms then, by Corollary 0U1 this sequence 
is deletion stable if and only if (Fi(P n ))^ =1 is deletion stable. Since forgetting 
leaf labels leaves the conditional split distribution unchanged ( Definition I26|). 
proving the cladogram cases reduces to proving the cases of fat or thin tree 
shapes. 

In the case of fat tree shapes, the conditional split distribution, q, of S is 
uniquely defined ( Proposition 05]). In the case of thin tree shapes, there is a 
unique conditional split distribution q of S which is symmetric in the sense 
that q(a,b) = q(b,a) (Proposition |25J. Take this split distribution q. 

By the definition of the conditional split distribution, the probability 
measure P n +i = Ylm=i <?( m ; n+1 — m)P m * P n+1 _ m for all n > 1 and so 

n 

D(P n+1 ) = ^2q(m,n + 1 - m)D{P m * P n+1 - m ) 

m=l 

for all n > 1. By the definition of D this is equal to: 

E m n + 1 — m 
q{m,n + 1 - m)(—D(P m ) * P n +\-m H P m * D(P n +i-m)) 
n n 

m=l 

Noting that P^ * Pq = Pq * P^ for all k, this expression may be rearranged 
into 

Pn—j{q{l,n) + q{n,l))+ 
n+1 

E" ( i , \ m + l^/n \ n / - m + 1 
q(m + l,n-m) — — D(P m+x ) * P n _ m + q(m,n- m + 1) — — P„ 
\ n+1 n+1 

m=l x 

Let c n+ i = 1 - ^x(g(l, n) + q(n, 1)). 

Thus, if S = (P n )^Lo is deletion stable then since D(Pk+i) = Pk for all 
k > it follows that: 

E ( i .m+1 \ n ~ m + 
q{m + l,n — m) — — + q[m,n-m + l) — — P m *P n 
\ n+1 n+1 

m=l x ' 



Note that if q(a, b) = q(b, a) then the coefficients of P m * P n - m and P n ~ m * 
P m on the right hand side are equal. 
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Thus, the uniqueness of the conditional split distribution q in the case 
of fat tree shapes, and the uniqueness of the symmetric conditional split 
distribution in the case of thin tree shapes, implies that 

ra + 1 n — m + 1 

q[m, n — m)c n+ i = q(m + 1, n — m) h q(m, n — m + 1) 

n + 1 n + 1 

for all integers n, m > 1 such that m < n. 

On the other hand, suppose that q is a conditional split distribution of 
S = (-Pn)^Lo which satisfies the equation in the statement of this proposition. 
Induction shows that (-P n )^Lo * s deletion stable as follows: 

It is always true that -D(-Pi) = Po and D(P 2 ) = Pi as there are unique (fat 
or thin) tree shapes with 0, 1 and 2 leaves. Suppose that D(Pk) = D(Pk-i) 
for all k e {1,2,..., n}. Then, by the computations above, 

D(P n+1 ) = P n — L-(g(l, n ) + q(n, 1)) + 
n + 1 

q(m + l,n- m) — — D(P m+1 ) * P n _ m + q(m,n- m + 1) — — P m * 

V n+1 n+1 

m=l x 

which by the inductive assumption is equal to: 

, (m+l)q(m+l,n— m)+(n— m+l)q(m,n— m+1) \ p p 
H n+1 J m * ^n—m 

By the definition of the conditional split distribution q and the assump- 
tion that it satisfies the equations given in the statement of the proposition, 
this expression is equal to P n . Thus D(P n+ i) = P n , and so by induction this 
holds for all n > 0. □ 



3.9 The case of the alpha model 

Proposition 42 All four of the alpha models are deletion stable for every 
value of alpha in [0, 1]. 

Proof. 

Recall that the alpha models on fat and thin cladograms are invariant under 
permutations of leaf labels. Lemmal2*7l states that all four of the alpha models 
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are Markovian self- similar and that the conditional split distributions, q a , for 
the alpha models are given by 



, Y a (a)T a (b) f a fa + b 

q a (a,b) - 



T a (a + b) V 2 V a 



T a (a)T a (b) fa + b\ fa _ ab \ 

T a (a + b) V a J \2 [ Q) (a + b)(a + b - 1) ) 

It remains to show that q a satisfies the equations given in Proposition 1411 
for all values of a in [0, 1]. 

Let a,b > 1, and n = a + b. Let A = q a (a + l,b) a °^_ 1 + q a (a, b + 1) . 
It is sufficient to show that A = (l — ^— -j- (q a (l, n) + q a (n>, 1))) q a (a, b) 

Expanding A and rearranging gives: 



. r a (a+l)r a (b)fafa + b+l\ in fa + b+l-2\\ 

A= T a (a + b+l) Ul a + 1 ) + ^~ 2a \ a )) 

, T a (a)T a (b+l) fa fa + b+l\ fa + b+ l-2\\ 

+ r a (a + b+l) Ul 6 + 1 ) +{1 - 2a) { b )) 



a + 1 
a + b + 1 

b + 1 



a + b + 1 



T a (a)T a (b) fa + b\ ( ,a , , x/1 . . (a + 1)6 

[a — a) — V [a — a)\l — 2a)- 



T a (a + b+l)\ a J\ 2 v ' ' (a + b + I) (a + b) 

TJa)TJb) fa + b\ ( .a W1 n . (b + l)a 

+ „ , " (6-a)- + (6-a)(l-2a) 



r a (a + 6+l)V a / \ 2 1 (a + 6 + l)(a + 6) 

Let C=|™ (««)_!_ go that 

.4 = Cx f + i, - 2a)(a + 6 + 1) + (1 " 2a) ((a " a)( ° + 1)6 + " a)( " + 



2 V /v ' a + 6 

which may be rearranged into: 

(a + b + 2(1- a))(a(a + b)(a + b-l) + 2ab) 



C x 



2(a + b) 



Now, 1 - ^(g(l, n) + q(n, 1)) = 1 - (f + (1 - 2a)) . Since 

(q+6-l)(n+&+2(l-q)) 
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a + b, this is equal to 1 ( . g+fe 



+l)(a+b-a) 
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Expanding q(a, b) and rearranging gives: 

T a (a)T a (b) fa + b\ 1 /a a& 

■x(a+o— a)(a+o+l) I — h (1 — 2a) 



r Q (a + 6 + l)V « /fl + Hl V 2 (a + 6)(a + 6-T 

(a + b - a) (a + b + l)(a(a + 6) (a + b - 1) + 2a6) 



C x 



2(a + 6)(a + 6- 1) 



Thns (1 - ^(g(l, n) + g(n, 1))) g(a, 6) = Cx ^±^f^ 
A as desired. 

Thus the conditions of Proposition ^2 ar e satisfied for all four alpha mod- 
els, and so they are deletion stable. □ 

Although perfectly correct, the above proof does not provide a good in- 
tuitive sense of why the alpha models are deletion stable. 

One answer to this is to view the alpha model as the stationary distribu- 
tions of the delete- alpha-insert Markov chains. These will be discussed in a 
subsequent paper. 



3.10 A note on multifurcating trees 

The general definitions and results of this chapter may all be extended to mul- 
tifurcating trees. In particular, the definitions of Markovian self-similarity, 
conditional split distribution, deletion of a uniform random leaf or labeled 
leaf, and deletion stability all extend in the obvious way. There is also a 
natural extension of Proposition ^2 to the case of multifurcating trees. The 
conditions on the split distribution are natural extensions of those for binary 
trees. 

For the sake of brevity, this material is omitted. 



3.11 Other probabilities on Cladograms 

The alpha models are some of many different probabilities on cladograms 
and tree shapes. The most popular and well known of these are the Yule, 
Uniform and Comb models. These three are also Markovian self-similar and 
deletion stable. The only other known models with these properties are the 
alpha model described here and the betal model of Aldous. 
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A major attraction of the alpha model is that it interpolates smoothly 
between the Yule, Uniform and Comb models. The beta model of Aldous 
also interpolates between these three and extends beyond the Yule model to 
give models with very flat trees. 

These models are now briefly discussed and compared with the alpha 
model. 



3.12 The Yule, Uniform and Comb models 

The Yule model, or neutral evolution model, was first defined by Yule in 1924 
[30J. It may be described in many different ways. The most convenient 
description here is the following (see [7j): Starting with a single species/leaf, 
at each step choose one of the extant species to bifurcate (split into two 
species) until the required number of species is reached. 

The Uniform model is simply the uniform distribution on cladograms of a 
given size. It is well known that there are (2n — 3)!! cladograms with exactly 
n > 2 leaves. See ^2] for example. 

The Comb model is the sequence of probabilities which assign probability 
1 to the most asymmetric tree of each size, called the comb tree. 

From the above description of the Yule model, it is clear that this is 
precisely the alpha model with a = since every new leaf is inserted at a 
uniform random leaf edge. Similarly, a simple induction shows that setting 
a = 1/2 gives the Uniform model, as the next leaf is inserted at a uniformly 
chosen edge. Finally, setting a = 1 gives the Comb model since every new 
leaf is inserted at a uniform random internal edge. 

A more formal proof of this fact goes as follows: 

When a = the conditional split distribution of the alpha model is 

r (a)r (6) fOf a + b\ in f a + b-2 

(a -!)!(&-!)! (a + 6-2)! 



(a + 6-1)! (a-l)!(6-l)! 
1 



a + b - 1 

which is split distribution of the Yule model. 

For the case of the Uniform model, a simple counting argument shows 
that the conditional split distribution satisfies q{a,b) = ( a ^ b )^ L , where 
c n = (2n — 3)!! is the number of cladograms with n leaves. 
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When a = ~ the conditional split distribution for the alpha model is: 



, jx ri(a)ri(6) f\ f a + b\ , lJa + b-2\\ 

^ = TfrW\${ a ) + ^- 2X 2\ fl -l JJ 
(a - 1 - |) ... (1 - |) (6 - 1 - |) ... (1 - i) 1 fa + b 



( a + 6 _l_|)...(l_i) 4 

1 (2a - 3)(2a - 5) . . . (3)(1)(26 - 3) . . . (3)(1) 

2 (2(a + 6) -3)(2(a + 6) -5)...(3)(1) 
1 (a + b\ (2a-3)!!(26-3)!! 



a 



2\ a / (2(o + 6)-3)!! 
1 fa + b\ c a Cb 



2 \ a J c a+b 

which is the conditional split distribution of the Uniform model. 

When a = 1 the conditional split distribution of the alpha model is 
Oi(l,n) = Gi(n, 1) = \ for n > 1 and qi(a, b) = if a or 6 is not equal to 1. 
This is the conditional split distribution of the Comb model. 

Notice that Yule trees tend to be flatter than Uniform trees, which are 
of course flatter than the Comb tree. Similarly, the average depth of leaves 
in a Yule tree is less than that in a Uniform tree which is less than that in a 
Comb tree. These observations can be made more precise using Colless' and 
Sackin's index. The inequalities extend to the alpha model and are made 
precise in Section HJ 



3.13 The beta model of Aldous 

The other probabilities on cladograms of interest are those of the beta model 
of David Aldous, described in jl]. Other than the alpha models, the beta 
model is the only known family which interpolates between the Yule, Uniform 
and Comb models and is Markovian self-similar and deletion stable. The 
beta model flows from a different description of the Yule model: uniform 
stick breaking. This uniform stick breaking is extended to stick breaking 
according to the beta distribution on the unit interval. The conditional split 
probabilities which arise are then extended beyond the point where stick 
breaking make sense. 

Like the alpha model, the beta model is deletion stable (sampling con- 
sistent) and Markovian self-similar. It is parameterized by a single variable 
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(3 G (—2, oo], passes through the Yule model at (3 = 0, the Uniform models 
at (3 = — |, and converges to the Comb model as f3 — > — 2. Unlike the alpha 
model, it extends beyond the Yule model to give distributions with much 
flatter trees ((3 > 0). 

As (3 —* oo it converges to the model defined by 'perfect 1/2 : 1/2 stick 
breaking'. This model should be the 'flattest possible' sampling consistent, 
Markovian self-similar distribution on cladograms. Here 'flattest possible' 
can mean either lowest expected value of Colless' (or Sackin's) index for all 
sizes of cladogram. 



3.14 The alpha model is not the beta model 

Here is a short proof that the alpha and beta models are different, and in 
fact only intersect at the Yule, Uniform and Comb models. 
The conditional split distribution of the beta model is 

i r(/? + a + i)r(/? + 6 + i) 
q[a ' } ~ k n {(3) r( a + i)r(6 + i) 

where k n (f3) is a normalizing constant. This is given in jl] and jH]- 

Theorem 43 The alpha model and the beta model intersect only at the Yule, 
Uniform and Comb models. 

Proof. 

Since it has already been shown that both models pass through the Yule, 
Uniform and Comb models, all that remains is to show that they do not 
intersect at any other point. It is sufficient to show that at no other point 
do the conditional split distributions agree. 

Consider the conditional split distribution for six leaves. To avoid dealing 
with the normalization constant in the beta model, take the ratios 4ttt and 

' 9(2,4) 

g(2,4) 
9(3,3) • 

For the alpha model these ratios are, respectively, 2 ^a)(8-a) anc ^ 4(2- a) • 
For the beta model these ratios are, respectively, f^ff and f^ff • 
Equating the first ratio of split probabilities gives: 



2(l + a)(4-a) _ (3 + 52 
(1 -a)(8-a) ~ p + 25 
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Solving for (3 gives 

5a(5 — a) 



6(a 2 -4a-2) 
Equating the second ratio of split probabilities gives: 

(8 -a) (3 + 43 



Solving for (3 gives: 



4(2 -a) /3 + 34 



2(1 + a) 

Thus, if the two models are equal it must be that: 

5a(5 — a) —9a 



6(a 2 -4a-2) 2(1 + a) 



In other words: 



Which happens only if 



-2a 3 + 8a 2 - |a 
3 - 2a 2 - 6a - 2 



= 



a 



7 

2 - a 3 + 8a 2 - -a = 

Solving for a gives a — 0, |, |. 

Since the alpha model is not defined for ol— \ and the other two values 
correspond the the Yule and Uniform model this completes the proof. As a 
final note, a = 1 did not appear as a solution because in that case (and only 
that case) the ratios are not real numbers. □ 



4 Sackin's index and Colless' index 

This section addresses two common statistics of tree shape: Sackin's index 
and Colless' index. Sackin's index is the sum of the depth of all leaves in the 
tree. In other words, the sum of the distance between the root and each leaf. 
Colless' index is computed as follows: For each internal vertex, compute the 
absolute value of the difference between the number of leaves below each of 
the two children, then sum up these numbers. 
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Sackin's index dates back to a paper of M.J. Sackin in 1972 [21], and 
Colless' to a paper of his in 1982 [T3]. These indices and others are described 
in an excellent paper of Shao and Sokal [22] • Formal symbolic definitions of 
each of these indices are given below. 

In this chapter, the expected value of both of these indices for a cladogram 
of size n chosen according to the alpha model is shown to be 0(n 1+a ) for 
a G (0, 1] and O(nlogn) for a = 0. Dividing by n shows that the expected 
depth of a random leaf is 0(n a ). 

Previous work on these and other indices in the cases of the Yule and 
Uniform models may be found in H31 EU HZ| ED] HH1 [101 HH - 

4.1 Sackin's and Colless' indices defined 

Now for a formal definition of Sackin's index. Denote by S(T) the value 
of Sackin's index and C(T) the value of Colless' index on a tree shape or 
cladogram T. 

Recall that the distance between two vertices in a tree is denoted by d. 

Definition 44 Given a fat or thin tree shape t with root r and leaf set s, 
Sackin's index for this tree is defined to be S(t) = J2ves(d(v,r) — 1). 

Note that the 'depth' of a leaf is counted from the first branch point rather 
than the root: d(r, v) — l rather than d(r, v). This is because many authors do 
not include the root edge in a tree shape, and also to preserve the alternative 
definition of Sackin's index given below. 

For a vertex v of a tree, let N v denote the number of leaves below and 
including v. An equivalent definition of Sackin's index is: 

Definition 45 Given a fat or thin tree shape t with internal vertex set I, 
Sackin's index for this tree is defined to be S(t) = ^2 vE jN v . 

Proposition 46 The two preceding definitions of Sackin's index agree. 

Proof. 

Let t be a fat or thin tree shape with root r, leaf set s and internal vertex set 
/. Let [S] denote the indicator function of a statement S. In other words, 
[S] — 1 if S is true and [S] = otherwise. Since t is a tree, the path from 
a leaf to the root is unique and passes through every vertex above the leaf 
exactly once. Thus ^2 v&s d(v,r) = J2ves^2ueil u aDove v \- Exchanging the 
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order of summation, this becomes ^2 u&1 ^2 V&S [u is above v] which is equal to 
J2uei 12 V £s[ v * s below u) which by the definition of N u is equal to ^2 ueI N u . 
Thus the two definitions of Sackin's index agree. □ 

Next to define Colless' index. First some notation is introduced. Every 
internal vertex, v, of a (fat or thin) tree shape has exactly two children. Let 
L v denote the number of leaves below the left child and R v the number of 
leaves below the right child. If the tree shape is thin then choose which child 
is 'left' and which is 'right' arbitrarily. 

Definition 47 Given a (fat or thin) tree shape t with internal vertex set I , 
Colless' index for this tree shape is defined to be C(t) = Ylvei \Lv — Rv\ 

Sackin's and Colless' indices for fat or thin cladograms are defined by first 
applying the map which forgets leaf labels and then calculating the index. 

The following identity may be found in and is used later in this 
chapter to show that Sackin's and Colless' indices have asymptotic covariance 
1 for all alpha models except a = 0. 

Lemma 48 If t is a fat or thin tree shape with internal vertex set I then 
C(t) = S(t) - 2J2 veI mm(L v ,R v ). 

Proof. 

By the definition of Colless' index, C(t) = Ylvei \L V — Rv\ = X^e/ L v + R v — 
2min(L„, R v ). Since N v = L v + R v for every internal vertex, v, it follows 
that C(t) = J2 ve i N v -2 mm(L v , R v ) = S(t) - 2 J2 ve i min(L„ R v ). □ 

Now, each of these two maps, S(t) and C(t), may be applied to a ran- 
dom variable on tree shapes to give a real random variable representing the 
distribution of each statistic. Let S n (a) and C n (a) denote the random vari- 
ables arising in this way from a random variable on tree shapes with n leaves 
which is distributed according to the alpha model on trees with n leaves. In 
other words, for a tree with n leaves chosen randomly under the alpha model, 
let S n (a) denote the distribution of Sackin's index and let C n (a) denote the 
distribution of Colless' index. 

These random variables have already been studied in great detail in the 
cases of the Yule (a = 0) and Uniform (a = 1/2) models. Results for these 
cases are surveyed in the next subsection. Some of these results are then 
generalized to cover all values of alpha in [0, 1]. 
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4.2 Sackin's and Colless' index for alpha trees 

In this section, some of the results just quoted will be generalized to all 
values of alpha. In particular, the expected value of Sackin's index for an 
alpha tree with n leaves is S n (a) = 0{n 1+a ) for a G (0, 1]. This implies that, 
for a G (0, 1], Colless' index is also 0(n 1+a ) and the covariance of Sackin's 
index and Colless' index is 1. 

4.3 The Yule and Uniform cases 

Much is already known about the distribution of Sackin's index and Colless' 
index in the cases of the Yule (a = 0) and Uniform (a = 1/2) models. In 
particular, the mean, variance and covariance are known. In the case of the 
Uniform distribution the limiting distribution, after rescaling is the Airy dis- 
tribution. These results are summarized or proven in the preprints of Blum, 
Francois and Janson [TUj, [HJ. Several papers have presented estimates of 
these values attained by simulation, such as those of Rogers [2T], 

In the case of the Yule model (a = 0): The correctly normalized Sackin's 
index, ■ 5 "( )-^ E5, "( ) , converges in distribution as n approaches infinity. The 
limiting distribution satisfies a fixed-point equation given by Rosier in [2*3] . 
and has variance a = 7 — . 

In the case of Uniform trees (a = 1/2): ( , ^72 ) converges in 
distribution to (A, A), where A is the Airy distribution. This is proven in 
[TTj . It also follows directly from the work of Aldous on continuum random 
trees: ffl, 0, 0. 

Notice that the mean and variance of S n (l/2) and C„(l/2) are both order 
^1+1/2 anc j their covariance trends to 1. 

4.4 The expected value of Sackin's index 

Now to show that the expected value of Sackin's index is 0(n l+a ) for a G 
(0,1]. Begin by defining some new statistics on trees which are close to 
Sackin's index. Next, find a recurrence equation which they satisfy and try 
to solve it. 

For a tree shape or cladogram t define: 
T(t) = sum of leaf depths 
K(t) = sum of depths of all internal vertices 

L(t) = sum of the number of internal nodes below and including each internal node 
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Here the 'depth' of a vertex is the number of edges in the shortest path 
between it and the root vertex: d(r,v). 

Let T a (n), L a (n), and K a {n) denote the expectations of these variables 
under the alpha model on tree shapes with n leaves. 

Note that, in the notation of the previous section, L(t) = ^2 veI (N v — 1) 

since the number of internal nodes below a vertex is one less than the number 

of leaves for a binary tree. Thus L(t) for a tree t is Sackin's index minus n — 1, 

the number of internal vertices. 

The first few values for each of these functions are: 
n Tjn) K a (n) L a (n) 

11 

2 4 1 1 

3 8 3 3 

4 12 + ^- 5 + ^- 5 + ^- 

3— a 3— a 3— a 

Notice that for these small values K a (n) = L a (n) and T a (n) — K a (n) = 
2n — 1. In fact, these relations hold for each individual tree: 

Proposition 49 For any binary rooted tree, t, with n leaves: 

• T(t) is Sackin's index plus n 

• T(t) - K(t) = 2n - 1, and 

• K{t) = L{t). 

Proof. 

Let s be the leaf set of tree t, I the set of internal vertices and r the root. 
Now T(t) is the sum of the distance from the root to each leaf, Y2 v &s ^( r ' s )' 
and there are n leaves. Thus it is equal to J2 v( z s (d(r, s) — 1) + n which is 
Sackin's index plus n. 

For the difference between the sum of leaf depths and the sum of internal 
node depths: This is true for n — 1. Suppose that it is true for all trees with 
less than k leaves. Given a tree with k leaves, the first split has p leaves to the 
'left' and q leaves to the 'right' (p + q = k). The difference between the total 
leaf depth and internal node depths on the left is (2p — 1) +p — (p — 1) = 2p, 
the difference on the right is (2q — 1) + q — (q — 1) = 2q. Adding these 
together and subtracting 1 for the depth of the first branching node gives a 
total difference of 2p + 2q — 1 = 2k — 1. 
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For the second part, if J is the set of internal nodes then let b(i,j) be 1 
if % is above j (closer to the root) and otherwise. Then 



ieJ Vie J / jeJ V«eJ 

The left hand side of this equation is the sum of the number of internal nodes 
below and including the internal node, and the right hand side is the sum of 
the depths of each internal node. □ 

A recurrence relation for the expected value of L(t) under the alpha model 
is now derived. 

Proposition 50 

r / -.\ t i + l (2n — 1)(1 — a) 

L a {n + 1) = L a {n) + ^ ^ (4) 

n — a n — a 

Proof. 

L a satisfies the recurrence relation: 

L a (n + 1) = 2gE? (L a (n) + 1 + *=fei±fezll) 

+^ [ L M + (%m - 1) + (%m + 1)) 

= L a (n) + ^ + (L a (n) + (n - 1)) + ^ (2L a (n)) 

= La ( n )^±l + (2n-l)(l-a) 

The first collection of terms in the group corresponds to a leaf displace- 
ment, which occurs with probability n ^~^ ■ When this occurs, all the old 
nodes are still above the nodes they were above before, contributing L a (n). 
The new internal node has exactly itself below itself and thus contributes 1. 
An existing internal node gains this new internal node as a descendant if it 
is above the displaced leaf, so this contribution is the equal to the expected 
number of internal nodes which are above the displaced leaf. This is equal 
to the expected depth of the leaf minus 1, which is — 1 = L "( W ) + ( TO ~ 1 ) _ 

The second collection of terms corresponds to an internal node being dis- 
placed, and occurs with probability ^"-a" • m ^ s case > an °f the old nodes 
are still above the nodes they were above before, contributing L a (n). The 
number of internal nodes the new node is below is equal to the number of 
nodes the one it displaced was below (excepting that node), for a total ex- 
pected contribution of ~ 1- The new internal node is above all the 



50 



internal nodes that the number of the node it displaced was above plus one 

~ a(n—1 ' 
n-1 



for itself (for an expected contribution of La ^ n ^ +1). □ 



Similar arguments give recurrences for T a and K a . The resulting recur- 
rences may also be obtained by substituting the equations in Proposition I4U1 
into the equation in Equation |U 

Notice that L a (n) is strictly increasing in n for fixed a. This follows as 
> 1 and ( 2n ~ 1 )( 1 ~ a ) > o. Also, L a (n) is an increasing function of a 

n—a n—a ' "V / ° 

for each fixed n, strictly increasing for n > 4. To see this, note that both 
_L_ D0 th. (2n-i)(i-a) > g are strictly increasing in a; thus if LAn — 1) is an 

n— a n—a J ° "\ / 

increasing function of a then so is L a {n). 
Theorem 51 L a (n) is 0(n 1+a ) for a G (0, 1] 
Proof. 

Fix a G (0, 1]. Begin by showing that L a (n) is o(n 1+a+e ) for all e > 0. 

Let M(n) = L a (n)(l — a) and suppose that for some choice of c and some 
sufficiently large n it is true that M(n) < cn 1+a+e . Then equation 0] gives: 

M(n) < c(n - l)i+«+*!^±i + ^—1 

n—a n—a 

letting x — -, this is 



1 — aa; 1 — aa; 
< cn 1+Q+e (l - ea; + 0(x 2 )) + (2 + (2 - a)x + o(x)) 

= c(n 1+a+e - en a+e + o(n a+e )) + 2 + (2 - a) - + o(-) 

n n 

For sufficiently large n this gives: 

M (n) < cn 1+a+t 

Applying induction starting at this value of n gives the desired result. 
Let A(n) = M(n)/n 1+a . Thus A{n) is o(n e ) for all e > 0. As before, 
equation |U leads to: 

n 1+a A(n) = n 1+a (l + 0(—))A(n - 1) + 2 + o(l) 
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which rearranges into: 

A(n) - A(n - 1) = 0(^)A(n - 1) + O(-L) 

This implies that A{n) is bounded, as ~* * s bounded for all k > 1, in 
particular for fc = 1 + a and = 2 — e (for sufficiently small e) . On the other 
hand, A(n) is positive and is strictly increasing for sufficiently large n and 
so is bounded away from by a definite amount. Thus A(n) is 0(1) and so 
M(n) and L{n) are 0(n 1+a ). □ 

This immediately gives: 

Corollary 52 For a G (0, 1] the expected value of Sakin's index for a random 
alpha tree with n leaves is order n 1+a . 

Dividing by the total number of leaves, n, gives: 

Corollary 53 For a G (0, 1], the expected depth of a random leaf in a tree 
chosen from the alpha model with n leaves is 0(n a ). 

4.5 Covariance of Sackin's and Colless' Index 

It has just been shown that the mean of S n (a) is of order n 1+a for all 
a G (0, 1]. It will now be shown that in fact ^nS^h^iM. converges to 
in probability for all a G (0, 1]. 

In fact, the values of Sackin's index and Colless' index on any tree of size 
n differ by at most n\og 2 n - This shortcuts the need for Lemma 3 in 
replacing it with an easier and much better result. 

Given a tree shape or cladogram, T, define v (T) to be the sum over all 
internal vertices of the minimum of the number of leaves below each child of 
that vertex. In other words, v(T) = ^2 veI mm(L v , R v ). By Lemma HH1 this is 
half the difference between Sackin's index and Colless' index for the tree T. 

It seems plausible for v (T) to take its maximum value over all trees with 
a fixed number of leaves at a very 'balanced' tree. A perfectly balanced tree 
with n = 2 k leaves has value v(T) = 2 k ~ 1 k = n lo | 2 - . It also seems reasonable 
for this tree to have the greatest value over all trees with at most 2 k leaves. 
This suggest that if T is a tree with n leaves then the difference between 
Colless and Sackin's index, 2v(T), is at most n\og 2 n. 

This turns out to be a good heuristic. 
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Lemma 54 IfT is a tree shape or cladogram with n leaves then the difference 
between Colless' and Sackin's index for T is at most n\og 2 n. Specifically 
< S(T) -C(T) <n\og 2 n. 

Proof. 

Recall that the difference between Sakin's and Colless' index for a tree T is 
S(T) - C{T) = 2 J2 ve i min(L„, R v ) = 2v{T) (by Lemma HEJ) , where I is the 
set of internal nodes of T. Let f(n) be the maximum value of v(T), over 
all tree shapes (or cladograms), T, with n leaves. Clearly v(T) > 0, and so 
f(n) > 0. Now show that f(n) < ~\og 2 n. 

The proof is by induction. First, not that for k = 1,2,3 there is only one 
tree shape with k leaves and v(T) = 0,1,2 respectively. These values are 
less than or equal to (llog 2 l)/2, (21og 2 2)/2, (31og 2 3)/2 respectively. Thus 
f(k) < (klog 2 k)/2 for k = 1,2,3. 

Suppose that f(k) < (klog 2 k)/2 for all k < n. Note that f(n) satisfies 
the recurrence relation f{n) = maXj e { lj 2,3 ) ...,Ln/2j+i} f(i) + f( n — i) + i. This 
follows as for every tree T with first split {i, n—i} has v(T) equal to min(i, n— 
i) plus the value of the left and right subtrees, which are bounded above by 
f(i) and f(n— i) respectively. On the other hand, f(i) + f(n—i)+mm(i,n—i) 
is obtained for the tree which is the root join of trees with i leaves and n — i 
which maximize v for these numbers of leaves. 

Assume without loss of generality that i < |. Thus it is sufficient to 
show that for all 1 < i < ~, the following inequality holds: (n log 2 n)/2 > 
((n — i) \og 2 (n — i))/2 + (ilog 2 i)/2 + i. In other words, show that > 
(n — i) log 2 (n — i) + i log 2 i — n log 2 n + 2%. 

The second derivative of the right hand side is \ H — ^r, which is always 
greater than zero. Thus the function is convex. The inequality is true when 
% = 1, and equality holds when i = n/2. Therefor, by convexity, the equality 
holds for all i between 1 and n/2. 

Thus the inductive step holds and the lemma is proven. □ 
In other words, for large trees which are not too symmetrical these two 
statistics are almost identical. 

This leads immediately to the desired result: 

Corollary 55 Sn ^~^ n ^ converges to uniformly (and so in probability) 
as n approaches oo, for all a G (0, 1]. 

And also: 

Corollary 56 C n {a) = 0{n l+a ), for all a G (0, 1]. 
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Proof. 

This follows directly from the previous Corollary and Corollary 1521 □ 



5 Sweet Cherries 

An easily computed statistic of a cladogram is the number of cherries. A 
cherry is a pair of leaves which are both adjacent to the same internal vertex. 
For example, the balanced rooted tree with 4 leaves has two cherries. 



McKenzie and Steel [TH] showed that for the Yule model on rooted trees 
and Uniform model on unrooted trees the number of cherries is asymptoti- 
cally normal, with known mean and variance. These results are now extended 
to the alpha model: 

Theorem 57 // C m is the number of cherries in a random alpha tree with 
m leaves then for a G [0, 1) 



For a = 1 and m > 2, C m is identically 1 as a comb tree has only one cherry. 

The proof of Theorem |57| follows the methods in JH]- First, describe the 
formation of cherries in terms of an extended Polya Urn model and apply a 
theorem which proves asymptotic normality. Next, use probability generating 
functions to find recurrences for the mean and variance. Finally, solve these 
recurrences to find the asymptotic mean and variance. 

Along the way, an exact formula for the mean is obtained. The exact mean 
and variance have previously been calculated for the Yule model (a = 0) in 
[THj . and for the Uniform model on unrooted binary trees in [27|, JE] and 
[2*H] with these results collected in [TSj . An exact formula for the variance for 




Figure 12: A tree with two cherries 



l-a 
3-2a 
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all a may also be possible using the usual techniques for solving recurrence 
equations. 

5.1 Extended Polya urn models 

This section reviews a recent central limit theorem on extended Polya urn 
(EPU) models. This result is to prove the asymptotic normality of the num- 
ber of cherries in a random alpha tree. 
First define the urn models. 

Suppose an urn contains k different types of balls. If a ball of the i-th 
type is drawn from the urn then it is returned, along with Aij balls of the 
j'-th type. The value may be negative, corresponding to the removal of 
balls from the urn. Models with An > are referred to as generalized Polya 
urn (GPU) models [S], [Zj. Allowing for An to be negative, but requiring 
that the number of balls returned each time be a positive constant defines 
the class of extended Polya urn (EPU) models 0, [2U] . 

For both of these classes of urn models a number of asymptotic normality 
results exist. The one relevant here (found in 0, [2E|) is as follows: 

Theorem 58 ^ \2b] Let A = [A^] be the generating matrix for an EPU 
model, with principal eigenvalue X\. Let v be the left eigenvector of A corre- 
sponding to Xi, where the entries V{ add up to one. Also let Z in denote the 
number of balls of type % in the urn after n draws, where i = 1, 2, . . . , k. For 
k = 2 suppose that: 

(i) A has constant row sums, where the constant is positive, 

(ii) Ai is positive, simple, and has a strictly positive left eigenvector v, 
(Hi) 2A < Ai for the non-principal eigenvalue A; 

then n~ l l 2 (Zi n — nX\Vi) has asymptotically a normal distribution with 
mean zero. 

Furthermore, for k > 2, suppose in addition: 
(iv) 2Re(X) < Ai for all non-principal eigenvalues X, 
[y ) all complex eigenvalues are simple, and no two distinct complex eigen- 
values have the same real part, except for conjugate pairs, 
(vi) all eigenvectors are linearly independent; 

then n~ 1/2 (Z ln - nX 1 v l , Z 2n - nXiV 2: . . . , Z(fc-i)« - nAii>( fc _i)) has asymp- 
totically a joint normal distribution with mean zero. 

This theorem also applies to the case when the number of balls is a non- 
negative real number rather than a non-negative integer. 
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5.2 The number of cherries is asymptotically normal 



This section follows the approach in [T8| of describing the process of cherry 
formation in terms of an extended Polya urn model and applying theorem 
Theorem 1581 

The asymptotic distribution for the number of cherries my be found by 
realizing the process of cherry formation as an EPU model. Each new leaf 
is added in the alpha model by choosing an edge at random according to 
weights, breaking the edge in two with a new internal vertex and attaching 
a new leaf edge at that new vertex. An extra cherry is created exactly when 
a leaf edge which is not already part of a cherry is chosen at the point of 
insertion. 

Proposition 59 If C m is the number of cherries in a random alpha tree with 
m leaves then for a G [0, 1) there exists a variance such that 



Cm m 3 _2q 
0~ n 



A/-(0,1) 



Proof. 

First to realize the creation of cherries as an extended Polya urn. 

Let the first type of ball represent leaf edges which are part of a cherry, 
the second type of ball represent leaf edges which are not part of a cherry 
and the third type of ball represent internal edges. Each non-cherry leaf edge 
is represented by a ball of type 2 with weight 1 — a and each internal edge 
by a ball of type 3 with weight a. Each cherry is represented by two balls of 
type 1 with total weight 2(1 — a), as it consists of two leaf edges. 

In this way, the total weight of all balls of a given type is proportional 
to the probability that the next leaf is inserted into that type of edge. Note 
that the number of cherries is the weight of the first type of ball divided by 
2(1 - a). 

Now to determine what happens when a ball is chosen. 

When a new leaf edge is inserted at a leaf edge which is already part of 
a cherry, the net effect is to add a new internal edge and a new non-cherry 
leaf edge. The same happens when a new leaf edge is inserted at an internal 
edge. When a new leaf edge is inserted at a leaf edge which is not part of 
a cherry then a new cherry is created, a new internal edge created, and a 
non-cherry leaf edge lost. See Figure IB~2l 
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Figure 13: The effect of adding a leaf at a cherry leaf edge, an internal edge 
and a non-cherry leaf edge 



Recalling the weights chosen above, this means that the generating matrix 

for this urn scheme is: 

1 — a a 



A 



2 — 2a —(1 — a) a 
1 — a a 

This matrix has eigenvalues 1, and —2(1- 



-a), with corresponding eigen- 
. , a], [1,0,-1] and [1,-1,0]. As a G [0,1] the principal 
eigenvalue is Ai = 1 and the corresponding left eigenvector, scaled such that 



vectors 



1-Q 



its entries sum to one, is 



r2(i-«) 2 



l-a 



ft 



3-2a ' 3-2a' 

Thus the conditions of the EPU asymptotics theorem, Theorem 
satisfied. So, for some function c of m, 



are 



1 



m- 



2(1 



a; 



3 - 2ft 



Af(0, c) 



where A/"(/x, a 2 ) is a normal distribution with mean /i and variance a 2 . 

Finally, recall that Z\ m is 2(1 — ft) times the total number of cherries. 
Therefore the desired result follows. □ 
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5.3 The mean and variance 

Recurrence equations are now found for the mean and variance of the number 
of cherries under the alpha model. An exact formula for the mean is then 
found. 

Let C m be the number of cherries in a random tree shape or cladogram 
with m leaves picked according to the alpha model. Let \i m be the mean of 
C m and a m the variance. Note that each of these depends on the value of a. 



Theorem 60 The following recurrences hold: 

m(l — a) m — 2 + a 

A*m+i = 1 

m — a m — a 

2 _ a(l-a)m(rn-l) . o ( m-i+3a \ 

°TO+1 ~~ (m-a) 2 h °m \ m -a I 

, / 2(l-a)(m(l-2a)-a) \ _ o 4(l-a) 2 

-T^m \y ( m -a) 2 J (m-a) 2 

Furthermore: 

2 (l-a)(2-a) 

This theorem agrees with the corresponding theorems in ^H] for the Yule 

(a = 0) and unrooted Uniform models (setting a = 1/2). 

Proof. 

When inserting a new leaf into a cladogram the number of cherries in- 
creases if and only if the new leaf displaces a leaf which is not already part 
of a cherry. If the number of cherries increases then it increases by exactly 
one. Thus the variables C m obey the following recurrence: 



P[C m+1 = A;] = P[C m = A;-l j 



(l-q)(m-2(fc-l)) 

m— a 
(m-l)a+2k(l-a) 



+P[C m = k)- 

Let P m (x) = X]fc>o^[^- m = k] xk De ^ ne probability generating function 
for C m . Thus P\(x) = 1 as the tree with 1 leaf has no cherries, and P2(x) = x 
as the two leaf tree has exactly one cherry. 

Now find a recurrence equation for P n (x). The contribution to P m+ x(x) 
from the first term in the above recurrence is: 

m(l - a) 2(1 -a) 2 _^_ p / n 

Xlr fjAX) X lfYiXXJ 

m — a m — a ax 
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The contribution from the second term is: 

(m-l)a 2(1 - a) d 

m — a m — a ax 

Thus the probability generating functions P m (x) satisfy the following re- 
currence equation: 

(m - l)a + m(l - a)x 2(1 - a) d 

■Pm+lW = Pm{x) H 3j(l - Xj— P m (x) (6) 

m — a m — a ax 

Note that /i m = ^P m (a:)| 3 _i and a 2 m = £sP m {x)\ x=1 + \i m - n 2 m . For 

notational convenience let Pm\x) denote j^P m (x). 
Differentiating equation (0) yields: 

pi 1 ) f~\ — m(l-a) D i (m~l)q+m(l-q):r p (l) / \ 

- 2x)P«(x) + ^s(l - x)Pi 2) (x) 
Evaluating at x = 1, and noting that P m (l) = 1 for all m, gives: 

m(l — a) m — 2 + a 

fJ"m+l = 1 A*m (') 

m — a m — a 

There is one tree with two leaves and it has one cherry so fi^ = 1. By 
Proposition EH! that /i m ~ so a direct solution is not presented here. 
Differentiating equation (JBJ) a second time gives: 



3.(2) f ^| _ p(% T ^(l-°) I p(l)/ N m(l-a)-4(l-q) 



I Pm(x) ( m ~ l ) a+m ( 1 - a ) x+2 ( l - a )( 1 - 2x ) _|_ p, ( , 2) (x) 2(1 ~ a) (l - 2x) 
+P4 3) (x)^x(l- m x) Q 



Let s m = ^2P m (a;)| a;=1 so that o 2 m = s m + fi m - /j 2 m . 
Evaluating equation (JHj) at x — 1 gives: 

_ 2m(l-q)-4(l-a) , (m-l)a+m(l-a)-4(l-q) 

Sm+1 — /im m _ Q + S m m _ a 

_ „ 2(m-2)(l-q) , g m-4+3a 

" m m— a m m— a 

Equation (JTj) and = s m + /i m — /i^ gives: 

2 2 
S-m+l — a m+l ~ + Mm+1 

JTH-l I m— a m— a ' y \ m— a m— a y 

2 , a(l— a)m(l— m) . (m-2i«)(m-2ma)+a) . (m— 2+a) 2 2 

— °m+l H (m-a) 2 1 (m-a) 2 ^ m + (m-a) 2 
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(8) 



Substituting for gives: 

a m+l = Mm" " H m - a (°m — Mm + Mm) 

a(l— a)m(l— m) (m-2+a)(m-2m«+a) (m— 2+a) 2 2 

(m-a) 2 (m-a) 2 Mm (m-a) 2 Mm 

a(l-a)m(m-l) . 2 / m-4+3a \ 
~~ (m-a) 2 ' °m V m-a ) 

( 2(l-a){ m -2am+a) \ o ( 4(l-a) 2 " 

"T/Xm I (m-a) 2 / \ (m-a)' 2 



(9) 



From Proposition I5TJ1 fi m = ^z^ m + r ( m ) an d cr 2 ^ = cm + p(m), where r 
and p are o(m) and c is some constant depending on a. 

Substituting this into equation Q and multiplying by (m — a) 2 (3 — 2a) 2 
gives a quadratic in m which must equal zero. As r and p are o(m), the 
coefficient of m 2 must tend to zero: 

c(3 - 2a) 2 (4a - 5) - (p(m + 1) - p(m))(3 - 2a) 2 + (1 - a)(2 - a) -»• 

This gives: 

p(m + 1) — p(m) (1 — a)(2 — a) 

5 -4a ' (3-2a) 2 (5-4a) ~ C 

As c is a constant (for fixed a) this means that p(m + 1) — p(m) must have 
a limit, which can only by as p = o(m). Thus 

(l-a)(2-a) 



(3-2a) 2 (5-4a) 

□ 

It would of course be nice to know exactly how fast the convergence of 
/i m and a 2 is. More explicit formula are given below. 

Corollary 61 For m > 3, a 6 [0,1), the expected number of cherries in a 
random Alpha Tree with m leaves is: 

m-l . 

1 — a , .a a -r-r i — I + a 

^ = 3^ ( m - " } + 2 + 273^2aT I] 



Proof. 

Let /i m = jEo^( m - a ) + f + X n 
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Then 



/Wi - ^z^r + {sZ2^{m - a) + ^ + X m ) 



±=*-(m + 1 - a) - (1 - a) + + % ( 1 - ] + 



3- 


-2a 


1- 


-a 


3- 


-2a 


1- 


— Oi 


3- 


-2a 



m—a 2 

' m + 1 _ a ) + m(l-a)-(m-a)(l-a)-a(l-a) + « + m^j 

m + 1 - a) + f + ^±£X m 

v / 1 2 m—a '" 

So X m+1 = ^^±^X m , and so for m > 1 (and a ^ 1) 

I — a a -p-r 2 — 2 + a 

^ = z-^7(™ - a) + - + H — - X! 



m 



i=i 



and X x = - 3 a i ( 2a a) - f = 2(3^)" As X 3 = 2(3^)' a more Phasing 
formula for m > 3 and all values of a is: 

1 m-1 . 

1 — a .a a -r-r 2 — 2 + a 
Mm = ( m — a H 1 ; r I 



3-2a v y 2 2(3 -2a) 11 i - a 



□ 



For rational values of a the product term telescopes. In the case of a = 0, 
the Yule model, the expected number of cherries is /i m = y In the case of 
a = |, the Uniform distribution on cladograms, the expected number of 

i • • m(m— 1) 

cherries is /i m = 

Note that this second value differs slightly from the numbers given in ^H] 
and [16) as the uniform trees considered there are unrooted. 

6 The shape of evolution: Treebase and the 
big picture 

6.1 Questions about shape 

This section addresses the shape of phylogenetic trees found in nature and 
possible biases in common reconstruction techniques. The recent increase in 
protein and nucleotide sequence data and availability of programs for recon- 
structing phylogeny from such data has lead to a large number of published 



61 



phylogeny. Many of these phylogenetic trees have been made available in 
online databases, such as Treebase 

Some natural questions that arise are: How asymmetrical are the trees 
found in nature? Do they follow some nice probability distribution and if so 
what is it? Are all trees about the same shape? Are there systematic biases 
in different reconstruction techniques? An excellent discussion of these issues 
is given by Mooers and Heard [TU] . 

The question of the 'amount of asymmetry' in natural trees is often raised. 
One major stumbling block in a systematic analysis of tree shapes has been 
the absence of a good measure of imbalance. Heard's analysis jTJj of 208 
published phylogeny is hampered by exactly this problem. Several measures 
of tree imbalance have been considered in the past such as "Colless's I" 
and "Sackin's index" ( see Section HI for a description of these statistics). 
Unfortunately these statistics change greatly with the number of leaves, and 
have means and variances depending on the probability distribution chosen 
(sec [22 J for example). 

It has often been observed that phylogenetic trees found in nature are in 
general more symmetric than Uniform trees but not as symmetric as Yule 
trees (for example JH],CH])- This observation is verified and quantified here 
by examining the distribution over the trees in Treebase of the maximum 
likelihood estimate of the parameter in the alpha model. The median of 
these estimates is about a = 0.22, directly between the Yule (a = 0) and 
Uniform (a = 0.5) models. 

A variety of statistics are used to measure how close the data fits the 
alpha model. Combined p-values are used to reject the hypothesis that the 
trees in Treebase all fit with the alpha model. 

Other than, perhaps, [T3j this appears to be the first systematic analysis 
of the shape and balance of a large number of published phylogeny. 

6.2 Estimating alpha 

The probability of a given tree shape under the alpha model is a rational 
function of alpha, and may be easily computed. By Lemma 123 the alpha 
model is Markovian self-similar with conditional split distribution 



where T a (n) — (n — 1 — a){n — 2 — at) • • • (2 — a)(l — a) and r a (l) = 1 



q a (a,b) 



r a (a)T a (b) ( a fa + b 
T a {a + b) \2\ a 



) 
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By Proposition |2H] the probability of a tree shape under such a model is 
the product of the conditional split probability at each branch-point. See 
Section 13*31 for more details and an example. 

In the analysis presented here the probability of the tree shape was cal- 
culated for 1000 equally spaced values of a in [0,1]. The maximum over 
these 1000 points was then taken as a good approximation of the maximum 
likelihood for alpha. 

Two transformations to the set of trees were made before estimating 
alpha. The first was to remove all non-binary trees as there they are not 
covered by the model (and probably indicate insufficient data to reconstruct 
a tree ^1]). The second was to accomodate the fact that most published 
trees contain an outgroup. 

In many phylogenetic reconstructions, an outgroup is used to locate the 
root on a reconstructed tree, as many algorithms give unrooted trees or 
unsure root positions. An outgroup is a singleton or pair (or more) of taxa 
which are artificially chosen to be significantly different from the rest of the 
taxa in the analysis. These are then used to root the reconstructed tree as it 
is assumed that the first speciation event separates the outgroup taxa from 
the from the main group, sometimes called the ingroup. 

The addition of outgroups in this manner is expected to increase the 
average imbalance of trees and the maximum likelihood estimate for alpha. 
To avoid this bias, all trees were split at the root into two separate trees. Trees 
of size 3 or less were all discarded. In the event that a tree was constructed 
without an outgroup this should not greatly effect the estimate of alpha, 
particularly if the tree shape obeys a Markovian self-similar model (as seems 
evolutionarily plausible). Almost all trees in the sample set appeared to have 
an outgroup. 

The median values for the maximum likelihood estimates for alpha before 
and after this splitting at the root were about 0.37 and 0.22 respectively. 
Thus, removal of the outgroup does significantly effect the estimation of 
alpha. This is to be expected for trees of the size most present in Treebase. 
Estimation for larger trees should be less effected by the presence of an 
outgroup. 

Figure El shows a histograph for the maximum likelihood estimates for 
alpha, categorized by reconstruction method. All trees with less than 10 
leaves were discarded as for small trees the number of different shapes is too 
small to allow for a fine estimate of alpha. The number of trees remaining 
was 761. 
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Alpha MLE distribution for all trees 
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Alpha MLE distribution for bayesian trees 
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Alpha MLE distribution for parsimony trees 



Alpha MLE distribution for maxlike trees 
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Alpha MLE distribution for neighborjoining trees 



a nmmr 



— i — i — i — i 

0.2 0.4 0.6 O.i 

Alpha MLE ( 76 trees total) 



Figure 14: Maximum likelihood estimates of alpha for trees with at least 10 
leaves (outgroups removed) 64 



Treebase entries also include the method of reconstruction in most cases. 



Here are summary 


statistics 


, with a 


break-down by reconstruction method. 






Method 


# trees 


Min. 


1st Qu. 


Median 


Mean 


3rd Qu. 


Max. 


all 


761 


0.0000 


0.0800 


0.2200 


0.2536 


0.3900 


0. 


9900 


parsimony 


387 


0.0000 


0.1050 


0.2300 


0.2565 


0.3800 


0. 


9900 


maxlike 


107 


0.0000 


0.1000 


0.2300 


0.2545 


0.4100 


0. 


7400 


neighbor joining 


76 


0.0000 


0.0975 


0.2150 


0.2361 


0.3225 


0. 


9200 


bayesian 


21 


0.0000 


0.0200 


0.1000 


0.1262 


0.2100 


0. 


4100 


unknown 


170 

















Note that the median is consistently around 0.22 (except for the bayesian 
method). The number of trees with maximum likelihood estimate for alpha 
strictly between and 0.5 is 511 out of a possible 761 (about 67%). 

Applying a t-test to the estimates for parsimony and bayesian methods 
gives a p- value of 5.943 * 10~ 5 (degrees of freedom=26.494). This indicates a 
strong differential bias between these two reconstruction methods. However, 
it should be noted not all methods were applied to all data. It may be that 
phylogenists working on different types of organism with different average 
tree shapes may prefer one reconstruction method over the other. In order 
to do a fully systematic study each method should be applied to the original 
sequence data where it is available. 

The large spike at alpha = (about 20% of the trees) is discussed in the 
next section. 

6.3 Does the data fit the model 

This section covers the question of how well the data fits the model. This 
is addressed using p-value data for a number of different statistics on trees. 
As an explicit model is being tested there is no longer a problem with using 
statistics which change with the number of leaves. 

Given a statistic, model and tree, the distribution of the statistics under 
the model can be compared with the statistic on the given tree, to give 
a p-value. If the trees are generated by the model then such p-values are 
uniformly distributed (at least for continuous distributions). 

For each tree, and statistic, this p-value was estimated by generating 1000 
random trees from the model (with the MLE value of alpha) to approximate 
the distribution under the model. This estimate has the correct mean, and 



65 



a variance of at most -—k^- 

4 v 1000 

The statistics used are "Colless' I" , "number of cherries" (pairs of adjacent 
leaves), "total depth of all leaves" (Sackin's Index) (equivalently: average 
leaf depth), "maximum depth of a leaf, and the probability (considered as 
a function on the set of trees of a fixed size). See Sections |U and El for more 
details on these statistics. 

Figure E3 shows scatter plots of these p- values against the estimate of 
alpha. Figure Uni shows qq-plots of these p- values against uniform [0, 1]. 

Looking at these plots it is clear that while the model is not terrible, it 
is certainly not a perfect fit. The lack of extreme p-values for alpha in (0, 1) 
might be explained by extreme trees being better fit by other values of alpha 
where their shape is not so unusual or extreme. This may also explain some 
of the large spike at a = 0, which comprises about 20% of all the trees. 
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scatter plot of prob rank vs alpha mle for all 



scatter plot of tdepth rank vs alpha mle for all 
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scatter plot of cherry rank vs alpha mle for all 
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scatter plot of colless rank vs alpha mle for all 
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Figure 15: P-values for various statistics plotted against alpha MLE 
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qqr-plot of prob for all trees 



qqr-plot of tdepth for all trees 





qqr-plot of mdepth for all trees 
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Figure 16: Q-Q Plots of for various statistics 
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7 Tree shapes with up to 7 leaves 



This appendix contains a list of all tree shapes with up to 7 leaves, ordered 
lexicographically. The number of phylogenetic trees with a given shape is 
stated, as well as the probability of this shape under the alpha model (con- 
ditional on the tree having that many leaves). 

The probability of a tree shape T under the alpha model is, by Proposition 
EH1 equal to: 

(a,6)e{splits(T) 

Recall that q a {a,b} = q a (a,b) + q a (b,a) if a ^ b and q a {a,a} = q a (a,a). 
Equation El provides the split distribution, q a , of the alpha model. 

If A(n) is the number of tree shapes with exactly n leaves then A(m) 
satisfies the following recurrence relations: A{2n + 1) = YH=i A{i)A{2n + 
1 - z), A(2n) = YIU A(i)A(2n -i) + A(n)(A 2 (w) ~ 1) + A(n). Set A(0) = for 
convenience and A(l) = 1. Thus, the first few values of A(n) are 1, 1, 1, 1, 2, 
3, 6, 11, 23, 46, 98, 207, 451, 983, 2179, 4850, 10905, 24631, 56011, 127912, 
293547, 676157. 

This is sequence A001190 in the Encyclopedia of integer sequences. The 
generating function, G(x), of sequence A001190 satisfies the recurrence rela- 
tion G(x) = x + (l/2)(G(a;) 2 + G{x 2 )) 




Figure 17: The trivial two-leaf tree, treeshape (1, 1). 

There is 1 phylogenetic tree with this shape. 

The probability of this shape under the alpha model is 1 




Figure 18: The unique three-leafshape, treeshape (2, 1) 
There are |r = 3 phylogenetic tree with this shape. 
The probability of this shape under the alpha model is 1 
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Figure 19: The four- leaf comb, treeshape (3, 1) 
There are ^ — 12 phylogenetic trees with this shape. 
The probability of this shape under the alpha model is 



Figure 20: Treeshape (3,2), sequence 4211211 or 4. 
There are ^ = 3 phylogenetic trees with this shape. 
The probability of this shape under the alpha model is 




Figure 21: Treeshape (4, 1), sequence 543211111 or 543 
There are ^ = 60 phylogenetic trees with this shape. 
The probability of this shape under the alpha model is 




Figure 22: Treeshape (4,2), sequence 542112111 or 54. 
There are ^ — 15 phylogenetic trees with this shape. 
The probability of this shape under the alpha model is 
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Figure 23: Treeshape (4,3), sequence 532111211 or 53. 
There are ^ — 30 phylogenetic trees with this shape. 

2(1 

The probability of this shape under the alpha model is 



Figure 24: Treeshape (5, 1), sequence 65432111111 or 6543. 



There are y = 360 phylogenetic trees with this shape. 

The probability of this shape under the alpha model is ( 5 1^4-a^3-a) 




Figure 25: Treeshape (5,2), sequence 65421121111 or 654. 
There are ^ = 90 phylogenetic trees with this shape. 

The probability of this shape under the alpha model is 2 ^Ia)(4-a)(3-a) 



Figure 26: Treeshape (5,3), sequence 6532111211 or 653. 
There are ^ = 180 phylogenetic trees with this shape. 
The probability of this shape under the alpha model is —. 
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Figure 27: Treeshape (5,4), sequence 64321111211 or 643. 
There are ^ — 180 phylogenetic trees with this shape. 
The probability of this shape under the alpha model is 



Figure 28: Treeshape (5,5), sequence 64211211211 or 64. 
There are |j = 45 phylogenetic trees with this shape. 

The probability of this shape under the alpha model is j- c 



Figure 29: Treeshape (5,6), sequence 63211132111 or 633. 
There are ^ = 90 phylogenetic trees with this shape. 
The probability of this shape under the alpha model is 4^ 
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